GeoscienceAustralia / dea-orchestration

4 stars 1 forks source link

Stac #92

Closed ashoka1234 closed 5 years ago

ashoka1234 commented 5 years ago

Reason for this pull request

STAC item and catalog files are required to be created for certain products that are uploaded to AWS s3 prod buckets. STAC files must conform to current STAC version 0.6.0. It was decided that these files are to be created/updated separate from product upload processes and needs to be automated.

Proposed Solution

The STAC item files are generated event-driven using a serverless lambda triggered by messages arriving in a AWS SQS queue. When a yaml file corresponding to a dataset is uploaded/modified into a s3 bucket, AWS is configured to send a message to this SQS queue. The parent catalog files are created/updated collectively and incrementally in batch mode. The following are the respective functions of the corresponding scripts:

ashoka1234 commented 5 years ago

The configs now separated into a separate file stac_config.yaml.

You can test the lambda function and notify_to_stac_queue.py by running python notify_to_stac_queue.py -b dea-public-data-dev fractional-cover/fc/v2.2.0/ls8/x_-12/y_-12/2018/02/22/LS8_OLI_FC_3577_-12_-12_20180222125938.yaml

then verifying the respective STAC item file at https://s3-ap-southeast-2.amazonaws.com/dea-public-data-dev/fractional-cover/fc/v2.2.0/ls8/x-12/y-12/2018/02/22/LS8_OLI_FC3577-12_-12_20180222125938_STAC.json

Similarly you can test the script stac_parent_update.py by using a command like python stac_parent_update.py -b dea-public-data-dev fractional-cover/fc/v2.2.0/ls8/x_-12/y_-12/2018/02/22/LS8_OLI_FC_3577_-12_-12_20180222125938.yaml

and then verifying the catalog.json files at

omad commented 5 years ago

Looks good Ashoka!

I've got some improvements in mind for the metadata being produced, but that can be done separately.