NASA-IMPACT / veda-data

2 stars 0 forks source link

Remove methane-farms collection from staging #116

Closed anayeaye closed 1 month ago

anayeaye commented 2 months ago

What

"Methane Emissions Manure Management (methane-farms)" needs deleted or removed.

This issue is to manage the probably small task of removing a bit of metadata from the staging catalog but is also an opportunity to start discussing VEDA data lifecycle/sunsetting.

Notes

This is a good opportunity to start discussing sunsetting data in terms of the data life cycle. Here are some initial thoughts about how this process could look:

  1. Open veda-data issue so that we have a trail for the removal
  2. Is this collection tied to a dashboard dataset config?-->If yes, that dataset mdx needs to be removed first along with any stories that use that dataset
  3. Are there any documents/notebooks for this collection? Ctrl-f in for the collection id veda-docs (https://github.com/NASA-IMPACT/veda-docs)
  4. Are we only removing database records, or are there S3 objects to remove?
    • Yes we're removing objects:
    • Are these objects assets of any other collections? It is possible to have hrefs from different collections pointing to the same file in s3. If another collection uses objects do not delete them.
    • If not referenced in other collections OK to delete or change the s3 object lifecycle policy (might be able to infer this from veda-data/data/input_config https://github.com/NASA-IMPACT/veda-data/tree/main/ingestion-data/discovery-items ctrl-f bucket pattern. NOTE if this is going to be a large number of objects, consider editing the lifecycle policy instead: it can be more cost effective to set an expiry date on objects than to perform s3 delete.
  5. Remove database records: PgStac implements a cascading delete so we only need to delete the collection and all child/item records will also be deleted. (For example https://dev.openveda.cloud/api/ingest/docs#/Collection/delete_collection_collections__collection_id__delete)

AC

anayeaye commented 2 months ago

Related veda-config branch: https://github.com/NASA-IMPACT/veda-config/tree/nc-hogs

anayeaye commented 1 month ago

Note: Looks like s3://veda-data-store-staging/hog-farms/express_methane_cog_2020.tif is now referenced in a new methane-manure collection so these two s3 objects do should not be deleted. I will circle back and double check the collection but this should be a very simple ingest-api delete operation to close.

When we use the airflow transfer to production DAG the COGs will be stored in a bucket matching the collection name.

anayeaye commented 1 month ago

After determining that we are keeping the objects, I found no other dependencies on the methane-farms collection and I have now deleted it.

From veda-config it looks like the replacement collection is methane-manure which is unfortunately invalid. I will reach out to see if we can support getting the collection stac metadata corrected or if it is still in the testing stage.

anayeaye commented 1 month ago

Also here is a forward PR to fix up the replacement collection: https://github.com/NASA-IMPACT/veda-data/pull/120