Closed anayeaye closed 8 months ago
Comment on links:
Stac-fastapi dynamically adds referential links to API responses. We should ignore stac-api results links with these "rel"
types in updates.
"root"
, "collection"
, "parent"
, "self"
, "items"
In some cases, data curators/providers may use links for other purposes like the fire vector collections. We should keep stac-api results with links with these "rel"
types. If external links are returned in the stac-api response, we should add those external links to the collection document in veda-data.
"external"
Based on the discussion we had at the PR walkthrough meeting, should the requirements be updated to something like:
## Requirements
0. Dry run mode run to test before updating actual collections
1. Authenticates a user to obtain a token for the ingest api
For each collection json `veda-data/ingestion-data/collections`
3. Requests existing collections/<collection_id> for existing record
4. If existing record exists, merge the summaries information from the existing collection into a copy of the collection json, specifically not writing the merged summaries back to the `veda-data` collection
5. Publishes the merged/best record to the target ingestion api/collections endpoint
Where the fourth requirement is updated to specify that the veda-data
collection is not overwritten? 🤔 @anayeaye
Thanks @botanical, I updated the requirements to reflect the outcome of the discussion we had in the PR walkthrough. So now the requirements include persisting dataset summaries that already exist in a database but no longer includes inserting those summaries into the collection records stored in this project.
My work is technically blocked by this issue: https://github.com/NASA-IMPACT/veda-architecture/issues/384 but I think this is being worked on by @slesaad (PR https://github.com/NASA-IMPACT/veda-backend/pull/293)
@anayeaye please correct me if I'm wrong 😅
@botanical that sounds right to me. After @slesaad's PR is merged you could technically run the notebook against the dev veda-backend ingest. If that works we can definitely merge this and call it done based on the dev backend test.
P.S. I just updated the acceptance criteria again to specify updating dev instead of staging.
the PR is merged, but looks like dev deployment is blocked by auth - see the predeploy failure here; need to fix that first
What
We need a tool for reconciling the corrected collection metadata records in this project with the records in a target VEDA instance (or for bulk loading a new instance). Format can be cli or notebook--it just needs to be re-usable.
The architecture wiki contains identifying and usage information for the operational auth and ingest systems.
PI Objective
https://github.com/NASA-IMPACT/veda-architecture/issues/356
Requirements
veda-data/ingestion-data/collections
Updated: Expected Differences
We expect the following properties to be added or updated
Dashboard concerns
The dashboard uses the staging database and depends on the 'summaries' property that is added by the ingestion pipeline, we want to preserve it in the update. Summaries can be recreated using the user defined postgres update collection summaries function (example).
AC