NASA-PDS / registry-sweepers

Scripts that run regularly on the registry database, to clean and consolidate information
Apache License 2.0
0 stars 1 forks source link

Refresh ancestry metadata on all nodes #85

Closed alexdunnjpl closed 6 months ago

alexdunnjpl commented 7 months ago

💡 Description

See #83

Ancestry metadata has been written to all swept domains, but without the fields added to the index.

Current versions of registry-sweepers ensure index state, but this is not retroactive. No-op updates do not trigger re-indexing, so it's necessary to either re-index or (much simpler) run a local instance of the ancestry sweeper against each node, setting the following in ancestry's generate_updates()

update_content = {
    METADATA_PARENT_BUNDLE_KEY: ["DEVELOPMENT PLACEHOLDER"],
    METADATA_PARENT_COLLECTION_KEY: ["DEVELOPMENT PLACEHOLDER"],
}

This will trigger index updates for the relevant fields, and the next ECS sweeper run will set the values back to what they should be.

alexdunnjpl commented 7 months ago

@jordanpadams please triage. This won't take a lot of active work time - mostly a matter of collecting the creds for each node and running the sweepers in the background.

jordanpadams commented 7 months ago

@alexdunnjpl per the credentials, we have an admin user we can use for this.

alexdunnjpl commented 6 months ago

@jordanpadams it should be sufficient to write the placeholder values, as they'll be overwritten when the next ECS sweeper instance occurs.

Okay to close this once the last couple of nodes complete?

alexdunnjpl commented 6 months ago

psa-prod is proving too voluminous to process on my laptop, so it will require either

jordanpadams commented 6 months ago

@alexdunnjpl let me know if there is anything you need from me to help with this. PSA does have by far the largest data volume.

alexdunnjpl commented 6 months ago

@jordanpadams I should be able to sort this myself tomorrow, thanks though!

alexdunnjpl commented 6 months ago

Fix for PSA is sitting in my IDE - will hold off on pulling the trigger until @sjoshi-jpl has confirmed PSA sweeper execution/scheduling paused, so as to avoid overlapping load.

alexdunnjpl commented 6 months ago

psa complete

Still need #39 implementation before PSA can be redeployed/re-enabled and the placeholders overwritten