developmentseed / public-datasets

[WIP] End to End solution to Create and Deploy a STAC API for public datasets
MIT License
5 stars 1 forks source link

[Future] feeder CLI should be run without input (e.g download the needed file itself) #6

Open vincentsarago opened 3 years ago

vincentsarago commented 3 years ago

https://github.com/developmentseed/public-datasets/blob/01a3fe2f443353f2c350ab1fb80c6c79085a3065/public_datasets/feeder/public_datasets/feeder/landsat/aws.py#L306-L307

right now to create the items (landsat) we need to do

  1. download the scene_list fro AWS
  2. download the WRS2 grids and merge them
  3. run the cli python -m public_datasets.feeder.landsat.aws data/landsat/scene_list.csv data/landsat/WRS2_daynight.geojson

I think it will be pretty simple if the CLi could download the scene_list and a grid (we stored on a S3 public bucket)

kylebarron commented 3 years ago

Personally I'd just tell the user to download them separately; that way the user can put them where they need to, and the library doesn't have to deal with downloading them.

I'd usually use a CLI like: feed-landsat --scene-list scene_list.gz --wrs2_grid grid

I do the same in landsat-cogeo-mosaic:

landsat-cogeo-mosaic create-from-db \
    --sqlite-path data/scene_list.db \
    --pathrow-index data/pr_index.json.gz
vincentsarago commented 3 years ago

I'm not sure I want to create a proper CLI feed-landsat because hopefully it has to be run just once. 🤷‍♂️ (I really don't like when I endup having CLI that are never used)

I agree that downloading the scene_list might not be the best solution, especially if a user just want to create few items. The grid on the other side need some pre-processing so I feel we should default to a one we produced