NASA-PDS / registry-sweepers

Scripts that run regularly on the registry database, to clean and consolidate information
Apache License 2.0
0 stars 1 forks source link

sweepers not run(ning) against geo-prod #124

Open jordanpadams opened 1 month ago

jordanpadams commented 1 month ago

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

When I did a members query attempt on a collection, that should work, it does not.

🕵️ Expected behavior

I expected it to work.

📜 To Reproduce

curl --GET https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data::1.1/members | json_pp
{
   "data" : [],
   "summary" : {
      "hits" : 0,
      "limit" : 100,
      "properties" : [],
      "q" : "",
      "search_after" : [],
      "sort" : [],
      "took" : 25
   }
}

🖥 Environment Info

Chrome / MacOSx

📚 Version of Software Used

Latest deployed

🩺 Test Data / Additional context

No response

🦄 Related requirements

This is blocking:

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

jordanpadams commented 1 month ago

@alexdunnjpl @tloubrieu-jpl any idea why this query is not working? We have noticed this with several attempts to run deep-archive have failed and produced incorrect data products. This has already been blocked on several occasions, and we thought we fixed those, so not sure what happened.

alexdunnjpl commented 1 month ago

@jordanpadams do you have an example product (full url to document preferred), which should be appearing in this query?

jordanpadams commented 1 month ago

@alexdunnjpl here is an opensearch query with the associated data products: https://search-geo-prod-6iz6lwiw6luyffpsq52ndsrtbu.us-west-2.es.amazonaws.com/_dashboards/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15y,to:now))&_a=(columns:!(lid,'ops:Tracking_Meta%2Fops:archive_status'),filters:!(),index:'04de9280-9067-11ed-aa4d-b9457fec4322',interval:auto,query:(language:kuery,query:'lid:urn%5C:nasa%5C:pds%5C:msl_gt_diagenesis_supplement%5C:data*'),sort:!())

Time ops:Tracking_Meta/ops:archive_status _id
  Mar 19, 2024 @ 07:51:22.348 archived urn:nasa:pds:msl_gt_diagenesis_supplement:data:veins::1.0

  | Mar 19, 2024 @ 07:51:22.270 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:target_classification::1.0

  | Mar 19, 2024 @ 07:51:22.170 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:nodule_rich_bedrock::1.0

  | Mar 19, 2024 @ 07:51:22.070 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:nodules::1.0

  | Mar 19, 2024 @ 07:51:21.985 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:local_rmsep_sigma20_win50_n20::1.0

  | Mar 19, 2024 @ 07:51:21.947 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:dark_strata::1.0

  | Mar 19, 2024 @ 07:51:21.862 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:cements::1.0

  | Mar 19, 2024 @ 07:51:19.577 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data::1.1

  | Nov 2, 2022 @ 10:55:10.771 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data::1.0

https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:cements::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:nodule_rich_bedrock::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:nodules::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:dark_strata::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:local_rmsep_sigma20_win50_n20::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:target_classification::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:veins::1.0

alexdunnjpl commented 1 month ago

@jordanpadams taking https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:cements::1.0 as an example, there is no sweepers metadata present in the document.

Has sweepers been running on whichever OpenSearch node hosts the relevant product documents?

jordanpadams commented 1 month ago

@alexdunnjpl I have no idea...

alexdunnjpl commented 1 month ago

Plan to run local sweepers against GEO. Currently blocked by GEO node getting hammered by MCP migration.

alexdunnjpl commented 1 month ago

possibly due to ReadErrors being encountered

@sjoshi-jpl is there any record available of if/when the geo-prod sweepers jobs started failing?

alexdunnjpl commented 1 month ago

Initial assumption about load was incorrect - there is a block of very-large documents in GEO, resulting in some requests taking more than an order of magnitude longer than others, specifically in repairkit (which does not pull document subsets)

Currently resolving by dropping repairkit page size to 500 and increasing timeout to 180sec.

At 500/pg, maximum observed request time was 1m42s

alexdunnjpl commented 1 month ago

Note to self - solved but not yet implemented, pending discussion with @tloubrieu-jpl

jordanpadams commented 1 month ago

@alexdunnjpl @tloubrieu-jpl where are we at with this? It looks like this may be resolved?

alexdunnjpl commented 1 month ago

@jordanpadams I need to loop back to it with @tloubrieu-jpl to decide on how we want to tweak the timeout parameters to resolve the issue.

I've run the sweepers locally to resolve the state of geo having no ancestry metadata and there's a good chance that sweepers are now running against GEO (because the massive docs are dealt with and no longer fetched by ancestry sweeper), but the root cause remains outstanding.

If it's important to close this out let me know - should be a quick thing, I've just been laser-focused on the migration and have been ignoring everything else.

jordanpadams commented 1 month ago

@alexdunnjpl 👍 all good. just checking.

tloubrieu-jpl commented 3 weeks ago

This going to be worked on after the migration to MCP is completed.