NASA-PDS / registry-sweepers

Scripts that run regularly on the registry database, to clean and consolidate information
Apache License 2.0
0 stars 1 forks source link

Provenance bulk update db writes fail under specific conditions related to presence of CCRs #34

Closed alexdunnjpl closed 11 months ago

alexdunnjpl commented 11 months ago

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

From prod cloudwatch logs

2023-07-26 01:29:33,853::pds.registrysweepers.utils::ERROR::Attempt to update document urn:nasa:pds:mars2020_rimfax:data_hk:xs1_0386_0701211713ehm0140733n__a_f0r1aut_04096j.csv::2.0 unexpectedly failed: {'type': 'invalid_index_name_exception', 'reason': 'Invalid index name [registry,geo-prod-ccs:registry,naif-prod-ccs:registry,sbnumd-prod-ccs:registry], must not contain the following characters [ , ", *, \\, <, |, ,, >, /, ?]', 'index': 'registry,geo-prod-ccs:registry,naif-prod-ccs:registry,sbnumd-prod-ccs:registry', 'index_uuid': '_na_'}

🕵️ Expected behavior

I expected [...]

📜 To Reproduce

1. 2. 3. ...

🖥 Environment Info

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

No response

alexdunnjpl commented 11 months ago

Error is due to this erroneously-constructed path. The registry,someRemote:registry,someOtherRemote:registry chunk is being interpreted as a index-name literal and failing.

This is unchanged since the script's initial inception and it's unclear if/how it ever worked in the context of cross-cluster remotes (unless the pre-OpenSearch ES supported this behaviour?)

@jordanpadams the fix here is to either write updates to each remote separately, or limit writes to registry proper. I strongly assume the latter is correct behaviour and the former is not even possible, but would like confirmation of that - metadata goes in our registry only, right?

Actually that raises an important question - where does the metadata for products in other clusters get written?

tloubrieu-jpl commented 9 months ago

Skip I&T because the solution is to launch one process per discipline node instead of one and rely on CCR to reach the discipline node. The design/architecture has been changed so that this bug does not need to be tested.