Closed jordanpadams closed 1 month ago
@jordanpadams I'll investigate shortly, but is it plausible that the two documents in question exist on different nodes and that's why they lack correct provenance history?
@alexdunnjpl hmmm. that would be interesting. they both say GEO, but I wonder if we loaded that data as a test case into the EN registry?
I'll dig into that now. If it ends up being the case that they're in two nodes,
@jordanpadams if I'm interpreting the swagger docs correctly, I don't think the request/expectation in the original ticket is even valid.
latest/all subroutes are only defined for /products/{identifier}/{all | latest}
. In that case, it's reasonable to expect that no latest/all specification yields only the latest product.
But why would there be any expectation that /products/?q=somequery
would only return the latest version of the LIDs of the full set of products resulting from that query?
/products/all?q=somequery
because all
would be parsed as an identifier. And it seems implausible that there should be no way to use a custom query to get all the products you want.EDIT: Having said all that, there are actually four versions in the registry - 1.0, 2.0, 4.0, 5.0, so it looks as though the API may indeed be filtering to superseded_by=null
in that query
Looking for the v5.0 doc, it's in geo just like the others, but it doesn't have any sweepers metadata set.
I'm running a local instance of sweepers on it - my suspicion is that GEO hasn't been running sweepers, based on what I'm seeing.
But why would there be any expectation that /products/?q=somequery would only return the latest version of the LIDs of the full set of products resulting from that query?
It is a requirement of the system that only the latest versions of products be returned, by default, unless otherwise stated.
https://github.com/NASA-PDS/registry-api/issues/428 https://github.com/NASA-PDS/registry-api/issues/426
Moving the rest of the discussion to NASA-PDS/registry-api#428 since I think it is more deservedly had there.
Task definition is v5 ARNarn:aws:ecs:us-west-2:445837347542:task-definition/pds-geo-prod-registry-sweepers-task:5
despite latest version of that task being v9 arn:aws:ecs:us-west-2:445837347542:task-definition/pds-geo-prod-registry-sweepers-task:9
This may explain #123 as well, if it is also lagging behind in task-definition versions and ending up using an out-of-date image as a result.
@sjoshi-jpl is there any reason why the schedulers shouldn't be targeting the latest version of the relevant task definition?
@sjoshi-jpl re-pinging for the question above. "This doesn't matter because we're moving to MCP" is an acceptable answer, too.
If we use the latest version of the reistry-sweeper this should not happend again.
superseeded_by: null
should be only for the latest product.
N.B. the root cause here is that sweepers has not run successfully since the latest product was ingested (probably due to the timeout issue).
It's not a bug, but may resurface.
Checked for duplicates
Yes - I've already checked
🐛 Describe the bug
https://pds.nasa.gov/api/search/1/products?q=(lid%20eq%20%22urn:nasa:pds:lro_lola_edr:data_raw%22) 2 records appear.
🕵️ Expected behavior
I expected 1 product to be returned
📜 To Reproduce
https://pds.nasa.gov/api/search/1/products?q=(lid%20eq%20%22urn:nasa:pds:lro_lola_edr:data_raw%22)
🖥 Environment Info
Chrome
📚 Version of Software Used
Latest
🩺 Test Data / Additional context
No response
🦄 Related requirements
No response
⚙️ Engineering Details
No response