NASA-PDS / registry-sweepers

Scripts that run regularly on the registry database, to clean and consolidate information
Apache License 2.0
0 stars 1 forks source link

Implement more-robust response success checking for all Opensearch requests. #42

Open alexdunnjpl opened 11 months ago

alexdunnjpl commented 11 months ago

Checked for duplicates

No - I haven't checked

🧑‍🔬 User Persona(s)

dev

💪 Motivation

So that we can rely on the correctness of sweeper code and avoid (for example) long, repeated, looping queries.

📖 Additional Details

Opensearch (v1, at least), returns HTTP200 for at-least-some types of request failures, including bulk updates.

With respect to the looping/hanging queries, I suspect that's due to expiration of scrolls (confirm whether scroll parameter is per-scroll, or per-query/lifetime). This should be easy to test by dropping the scroll window to 1min and adding 30sec delays. The second or third request should return HTTP200, but an error-indicative response, if it's the root cause of the 86% stall (see other issue, need to track it down).

Acceptance Criteria

All request failure conditionals test not only the HTTP status code, but also examine the structure of the returned response.

Ideally, this should be enclosed in a wrapper function.

⚙️ Engineering Details

No response

alexdunnjpl commented 11 months ago

Looks like this may only be a problem for the _bulk API https://github.com/elastic/elasticsearch/issues/41434

Actually no - it will happen in other contexts too (this is likely what @al-niessner was talking about - are CCS remotes equivalent to shards in this context?) https://github.com/elastic/elasticsearch/issues/18978

@jordanpadams

alexdunnjpl commented 11 months ago

There's a really good chance this would be available as a side-effect of #12