Closed PeterDSteinberg closed 7 years ago
Sounds great. I'll create a new bucket and post details internally once I have them.
Also relevant to this issue, @gbrener , is a module in what is now earthio
related to downloading from Amazon's LANDSAT S3 store. I made it in preparation for AnacondaCON demo presentation. At that time, I also added this more general LANDSAT util that is related to spatial and band metadata, not S3 and downloading. Neither of these LANDSAT related modules are affected by the earthio PR 1, except by file move from elm/readers/
to earthio/
Ok, good to know @PeterDSteinberg - thanks for the link. Before seeing your comment I wrote a similar script (https://github.com/ContinuumIO/elm-readers/blob/cd89b89108f0542d26b77044c4d4fb7a68b1ca63/scripts/download_test_data.py), except it's a bit more generic (although it currently expects files in the .tar.bz2
format. I wrote it with extensibility/flexibility in mind though, so we can always add more formats). The choice for bzip2
over gzip
was pretty arbitrary - conda uses bzip2 for packaging, but gzip is more ubiquitous - so I'm happy to switch to gzip if you have a strong preference for it. Please let me know your thoughts on whether I should combine the two scripts, and/or change the compression to gzip.
@gbrener, One idea is keeping s3_landsat_util.py
in place with some LANDSAT specific stuff or alternatively moving the LANDSAT specific code from s3_landsat_util.py
to landsat_util.py
and the S3 part of s3_landsat_util.py
to your download_test_data.py
script - up to you what is easier. The s3_landsat_util.py
I mentioned is for specifically downloading from AWS their LANDSAT store with logic specific to finding scenes a file called scene_list.gz
downloaded from AWS, while yours is geared toward the move of what is now elm-data to S3 buckets we control. Here's the SceneDownloader class that we could optionally move to landsat_util.py
or your download_test_data.py
or keep in place. I also have some code from a notebook that finds the lowest cloud cover image out of scene_list.gz from the AWS LANDSAT store - https://aws.amazon.com/public-datasets/landsat/ . I can commit my notebook to elm-examples soon, then generalize and commit those changes in this project later. I think the AWS LANDSAT store (among others) is a good test data set for us because it is already a large data set of interest that is highly available without our maintenance and there are interesting problems that can be done with a smaller subset of the data.
Ok, sounds good - I'm fine with keeping them separate.
Just so this is written down somewhere - in order to fully deprecate elm-data, we'll need to update the documentation so that it no longer references that repo. Based on an offline conversation with @PeterDSteinberg , we're planning to do this at a later to-be-determined date.
TODO items/PRs remaining before we close this issue:
I deleted elm-data repo, but I downloaded elm-data and zipped its latest contents first just to be safe. This logic is now handled by S3 downloading. See also #134
Installing the example / test data from elm-data is not a hard requirement of elm. It is used in Travis CI currently and some of the example notebooks. We will move this to S3 because it currently involves installing git LFS which can involve a few git commands / system installs some users are less familiar with. Use the datashader/examples/ idea for downloading sample data.