NASA-PDS / registry-sweepers

Scripts that run regularly on the registry database, to clean and consolidate information
Apache License 2.0
0 stars 1 forks source link

sweepers not run(ning) against geo-prod #124

Open jordanpadams opened 6 months ago

jordanpadams commented 6 months ago

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

When I did a members query attempt on a collection, that should work, it does not.

🕵️ Expected behavior

I expected it to work.

📜 To Reproduce

curl --GET https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data::1.1/members | json_pp
{
   "data" : [],
   "summary" : {
      "hits" : 0,
      "limit" : 100,
      "properties" : [],
      "q" : "",
      "search_after" : [],
      "sort" : [],
      "took" : 25
   }
}

🖥 Environment Info

Chrome / MacOSx

📚 Version of Software Used

Latest deployed

🩺 Test Data / Additional context

No response

🦄 Related requirements

This is blocking:

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

jordanpadams commented 6 months ago

@alexdunnjpl @tloubrieu-jpl any idea why this query is not working? We have noticed this with several attempts to run deep-archive have failed and produced incorrect data products. This has already been blocked on several occasions, and we thought we fixed those, so not sure what happened.

alexdunnjpl commented 6 months ago

@jordanpadams do you have an example product (full url to document preferred), which should be appearing in this query?

jordanpadams commented 6 months ago

@alexdunnjpl here is an opensearch query with the associated data products: https://search-geo-prod-6iz6lwiw6luyffpsq52ndsrtbu.us-west-2.es.amazonaws.com/_dashboards/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15y,to:now))&_a=(columns:!(lid,'ops:Tracking_Meta%2Fops:archive_status'),filters:!(),index:'04de9280-9067-11ed-aa4d-b9457fec4322',interval:auto,query:(language:kuery,query:'lid:urn%5C:nasa%5C:pds%5C:msl_gt_diagenesis_supplement%5C:data*'),sort:!())

Time ops:Tracking_Meta/ops:archive_status _id
  Mar 19, 2024 @ 07:51:22.348 archived urn:nasa:pds:msl_gt_diagenesis_supplement:data:veins::1.0

  | Mar 19, 2024 @ 07:51:22.270 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:target_classification::1.0

  | Mar 19, 2024 @ 07:51:22.170 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:nodule_rich_bedrock::1.0

  | Mar 19, 2024 @ 07:51:22.070 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:nodules::1.0

  | Mar 19, 2024 @ 07:51:21.985 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:local_rmsep_sigma20_win50_n20::1.0

  | Mar 19, 2024 @ 07:51:21.947 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:dark_strata::1.0

  | Mar 19, 2024 @ 07:51:21.862 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data:cements::1.0

  | Mar 19, 2024 @ 07:51:19.577 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data::1.1

  | Nov 2, 2022 @ 10:55:10.771 | archived | urn:nasa:pds:msl_gt_diagenesis_supplement:data::1.0

https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:cements::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:nodule_rich_bedrock::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:nodules::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:dark_strata::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:local_rmsep_sigma20_win50_n20::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:target_classification::1.0 https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:veins::1.0

alexdunnjpl commented 6 months ago

@jordanpadams taking https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:msl_gt_diagenesis_supplement:data:cements::1.0 as an example, there is no sweepers metadata present in the document.

Has sweepers been running on whichever OpenSearch node hosts the relevant product documents?

jordanpadams commented 6 months ago

@alexdunnjpl I have no idea...

alexdunnjpl commented 6 months ago

Plan to run local sweepers against GEO. Currently blocked by GEO node getting hammered by MCP migration.

alexdunnjpl commented 6 months ago

possibly due to ReadErrors being encountered

@sjoshi-jpl is there any record available of if/when the geo-prod sweepers jobs started failing?

alexdunnjpl commented 6 months ago

Initial assumption about load was incorrect - there is a block of very-large documents in GEO, resulting in some requests taking more than an order of magnitude longer than others, specifically in repairkit (which does not pull document subsets)

Currently resolving by dropping repairkit page size to 500 and increasing timeout to 180sec.

At 500/pg, maximum observed request time was 1m42s

alexdunnjpl commented 6 months ago

Note to self - solved but not yet implemented, pending discussion with @tloubrieu-jpl

jordanpadams commented 6 months ago

@alexdunnjpl @tloubrieu-jpl where are we at with this? It looks like this may be resolved?

alexdunnjpl commented 6 months ago

@jordanpadams I need to loop back to it with @tloubrieu-jpl to decide on how we want to tweak the timeout parameters to resolve the issue.

I've run the sweepers locally to resolve the state of geo having no ancestry metadata and there's a good chance that sweepers are now running against GEO (because the massive docs are dealt with and no longer fetched by ancestry sweeper), but the root cause remains outstanding.

If it's important to close this out let me know - should be a quick thing, I've just been laser-focused on the migration and have been ignoring everything else.

jordanpadams commented 6 months ago

@alexdunnjpl 👍 all good. just checking.

tloubrieu-jpl commented 5 months ago

This going to be worked on after the migration to MCP is completed.

tloubrieu-jpl commented 2 weeks ago

@alexdunnjpl @sjoshi-jpl we said we will work on that after the migration to MCP. Where are we with sweeper running on the nodes ? Thanks.

alexdunnjpl commented 2 weeks ago

@tloubrieu-jpl this is probably a perfunctory close at this point once it's able to be retested - I'll defer to Sagar on status but it'll become clear once the sweeper is running on GEO.

tloubrieu-jpl commented 2 weeks ago

blocked by https://github.com/NASA-PDS/registry-sweepers/issues/147