NASA-PDS / registry-sweepers

Scripts that run regularly on the registry database, to clean and consolidate information
Apache License 2.0
0 stars 1 forks source link

Nonaggregate products present in "foreign" collections/bundles do not have correct ancestry. #114

Closed alexdunnjpl closed 2 months ago

alexdunnjpl commented 2 months ago

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

As previously noted by @sjoshi-jpl's log error reports, there have been numerous cases of "Multiple updates detected for doc_id [...]"

I've tracked this down to what appears to be an issue where nonaggregate products who belong to collections/bundles not indicated by the non-aggregate lidvid produce multiple disjoint ancestry histories.

This will need to be accounted for with ad-hoc behaviour, possibly caching the colliding updates until after the next buffer flush (as the collision may not have been written to db at detection-time), then performing read/merge/write for each affected document, with the missing content.

🕵️ Expected behavior

I expect registry-sweepers to run (tested against psa-prod) without the production of "Multiple updates" errors

📜 To Reproduce

Run sweepers against psa-prod

⚙️ Engineering Details

This makes sense, given the fact that it's iteration over registry-refs docs (i.e. pages of collection-specific non-agg references) which produces the non-agg records. This is compounded by the new iterate-by-collection-lid approach.

Another way to phrase the issue:

alexdunnjpl commented 2 months ago

Example:

{
  "lidvid": "urn:esa:psa:context:instrument_host:spacecraft.tgo::1.1",
  "parent_collection_lidvids": [
    "urn:esa:psa:context:instrument_host::1.0"
  ],
  "parent_bundle_lidvids": [
    "urn:esa:psa:context::1.0",
    "urn:esa:psa:context::2.0",
    "urn:esa:psa:context::3.0",
    "urn:esa:psa:context::4.0"
  ]
}
{
  "lidvid": "urn:esa:psa:context:instrument_host:spacecraft.tgo::1.1",
  "parent_collection_lidvids": [
    "urn:esa:psa:em16:context::1.1"
  ],
  "parent_bundle_lidvids": [
    "urn:esa:psa:em16::1.0",
    "urn:esa:psa:em16::10.0",
    "urn:esa:psa:em16::100.0",
    "urn:esa:psa:em16::100.1",
[...]
    "urn:esa:psa:em16_tgo_acs::1.4",
    "urn:esa:psa:em16_tgo_cas::1.3",
    "urn:esa:psa:em16_tgo_nmd::1.2"
  ]
}
alexdunnjpl commented 2 months ago

Prioritised and added to current sprint, per @jordanpadams @tloubrieu-jpl

alexdunnjpl commented 2 months ago

Status: in-prog, need to

alexdunnjpl commented 2 months ago

Status: complete, PR opened

alexdunnjpl commented 2 months ago

Resolved by #116