NASA-PDS / registry-api

Web API service for the PDS Registry, providing the implementation of the PDS Search API (https://github.com/nasa-pds/pds-api) for the PDS Registry.
https://nasa-pds.github.io/pds-api
Apache License 2.0
2 stars 5 forks source link

Search criteria not producing expected matches #287

Closed jjacob7734 closed 1 year ago

jjacob7734 commented 1 year ago

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I applied this constraint to my search, no matches are found even though the result set without the constraint shows some matching items: orex:spatial.orex:target_range lt 400.0

🕵️ Expected behavior

I expected the items with orex:spatial.orex:target_range value less than 400 to appear in the search results.

📜 To Reproduce

  1. Run curl --get 'https://pds.nasa.gov/api/search/1/products' --data-urlencode 'limit=10' --data-urlencode 'q=((ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu") and (ref_lid_instrument eq "urn:nasa:pds:context:instrument:ovirs.orex"))' | json_pp | grep -A 1 target_range
  2. Observe a number of hits with orex:spatial.orex:target_range around 177 (which is less than 400).
  3. Run curl --get 'https://pds.nasa.gov/api/search/1/products' --data-urlencode 'limit=10' --data-urlencode 'q=((ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu") and (ref_lid_instrument eq "urn:nasa:pds:context:instrument:ovirs.orex") and (orex:spatial.orex:target_range lt 400.0))' | json_pp
  4. Observe that there are no hits, so the previously observed hits with values around 177 didn't match this time.
  5. Can also try wrapping the 400.0 in double quotes as "400", but that didn't seem to make a difference.

🖥 Environment Info

📚 Version of Software Used

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

No response

jordanpadams commented 1 year ago

@jjacob7734 can we verify that all the OREX data has actually been ingested into the registry? there is no guarantee it has all been loaded, or even more specifically, the date you are looking for has been loaded.

we have a snapshot of their data set from about a year ago here: https://pds.nasa.gov/data/pds4/test-data/registry/orex.ovirs/ (you can ping the SAs to request access to this server)

could probably just do an overall product count check for the labels in the collection (XML files) vs. the number of products returned for a query for all OVIRS data

jordanpadams commented 1 year ago

actually, from their Kibana Dashboard, I can see they have 1,146,784 OVIRs products ingested.

jjacob7734 commented 1 year ago

Yeah, it looks like there should be matches. In the instructions to reproduce, the first query does get matches that show a target range around 177, but when I add the requirement that target_range < 400 I get no matches.

alexdunnjpl commented 1 year ago

Interestingly, it looks like retrieval based on equality doesn't work.

Given (among others)

"orex:spatial.orex:target_range" : [
   "177.51266033499203"

The following queries fail to hit

curl --get 'https://pds.nasa.gov/api/search/1/products'     --data-urlencode 'limit=10'     --data-urlencode 'q=((ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu") and (ref_lid_instrument eq "urn:nasa:pds:context:instrument:ovirs.orex") and (orex:spatial.orex:target_range eq "177.51266033499203"))' | json_pp

curl --get 'https://pds.nasa.gov/api/search/1/products'     --data-urlencode 'limit=10'     --data-urlencode 'q=((ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu") and (ref_lid_instrument eq "urn:nasa:pds:context:instrument:ovirs.orex") and (orex:spatial.orex:target_range like "1*"))' | json_pp

ref_lid_instrument and ref_lid_targetare in the production index, while there is no mention of target_range.

Interestingly, like * produces the expected full set of hits, but that could just be getting optimized out of the query.

@jordanpadams I realise that #281 is shown as closed, but was it confirmed that the tested "semi-random" fields weren't indexed (can't see how they couldn't be, given my understanding of OpenSearch)? And if those fields were present due to dynamic reindexing when products with new fields are added, was the fix with that dynamic addition ever deployed to prod?

alexdunnjpl commented 1 year ago

@jordanpadams @jjacob7734 running latest-tagged harvest against bundle at https://pds.nasa.gov/data/pds4/test-data/registry/orex.ovirs.small/ results in

$ curl -k -u admin:admin https://localhost:9200/registry | json_pp | grep target_range -A 1
            "orex:spatial/orex:target_range" : {
               "type" : "keyword"

so it looks like this is the result of data being harvested prior to implementation of all-fields search support (or use of an equally-old release)

Fix is to reingest all such data with an updated version of harvest.

Leaving ticket open in case there is additional action needed (notifying some/all users, arranging for wholesale reingestion of large quantities of data on some/all nodes, etc)

alexdunnjpl commented 1 year ago

@jordanpadams pinging SBN to re-ingest

gxtchen commented 1 year ago

curl --get 'https://pds.nasa.gov/api/search/1/products' --data-urlencode 'limit=10' --data-urlencode 'q=((ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu") and (ref_lid_instrument eq "urn:nasa:pds:context:instrument:ovirs.orex") and (orex:spatial.orex:target_range lt 400.0))' | json_pp still not returning any hits, has the data been re-ingest yet? Should I just harvest https://pds.nasa.gov/data/pds4/test-data/registry/orex.ovirs.small/ locally for the test?

jordanpadams commented 1 year ago

@gxtchen you need to test on gamma. not on production since we haven't deployed there yet :-)

tloubrieu-jpl commented 1 year ago

Hi @gxtchen the latest registry-api is not deployed in production yet, you need to test this ticket on gamma, with base URL https://pds.nasa.gov/api/search-en-gamma/1/