NASA-PDS / registry-api

Web API service for the PDS Registry, providing the implementation of the PDS Search API (https://github.com/nasa-pds/pds-api) for the PDS Registry.
https://nasa-pds.github.io/pds-api
Apache License 2.0
2 stars 5 forks source link

field case in response and query have mismatched cases #362

Closed tloubrieu-jpl closed 1 year ago

tloubrieu-jpl commented 1 year ago

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

When I did this query https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:orex.ovirs:data_calibrated::11.0/members?limit=2

The property, for example orex:spatial.orex:latitude is shown in all lower case in the response (summary/properties or product/properties).

But to query this property in the q parameter, it is to be orex:Spatial.orex:latitude ge 9.0 with capital S.

🕵️ Expected behavior

I expected the response to return the exact case used in the PDS4 label.

📜 To Reproduce

Use api request https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:orex.ovirs:data_calibrated::11.0/members?limit=2

🖥 Environment Info

Seen in production

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

No response

⚙️ Engineering Details

No response

al-niessner commented 1 year ago

@tloubrieu-jpl

I need the document from open search for: urn:nasa:pds:orex.ovirs:data_calibrated:20180717t101957s996_ovr_blackbodyplusfilamentl2_v006.fits::1.0

I do not know where this instance of registry-api is getting its opensearch from. Can you please provide the document. Thanks.

al-niessner commented 1 year ago

@tloubrieu-jpl

Oh, can you also provide the q example? It does not return either of the other two documents and so it may have a different case in opensearch.

jordanpadams commented 1 year ago

@al-niessner will send you the info for querying the registry directly.

jordanpadams commented 1 year ago

@al-niessner see email I sent from LFT regarding the login information to query the registries directly. This particular data belongs to the SBNPSI registry. https://github.com/NASA-PDS/registry/wiki/Registry-Workshop-06_28_2022#opensearch-endpoints

al-niessner commented 1 year ago

@jordanpadams @tloubrieu-jpl

I tried adding ?q=field ge 9.0 after members but got URL error. Having that correct query would be super helpful.

With that I could compare the documents (records) and make sure they are both harvested with the same case. I searched the code and could not find lowerCase() call so I am guessing that it is just how the data is. You might want to start considering what to do if this is the case.

jordanpadams commented 1 year ago

@al-niessner it doesn't look like q= isn't doing anything when I try these queries:

https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:orex.ovirs:data_calibrated::11.0/members?q=(orex:spatial.orex:latitude%20ge%209.0)&limit=2

https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:orex.ovirs:data_calibrated::11.0/members?q=(orex:Spatial.orex:latitude%20ge%209.0)&limit=2

jordanpadams commented 1 year ago

@al-niessner @tloubrieu-jpl I think this may actually be a quirk in their metadata. Both of these work, but they are associated with different products and/or versions of the data?

https://pds.nasa.gov/api/search/1/products?q=orex:spatial.orex:latitude%20ge%209.0 https://pds.nasa.gov/api/search/1/products?q=orex:Spatial.orex:latitude%20ge%209.0

al-niessner commented 1 year ago

@jordanpadams @tloubrieu-jpl

Well, that is quirk that makes a mess. If they can typo in any case for them (sPatial instead fo Spatial or spatial) then how do you want to fix that? We can lower case them in the DB but then how to return them to their desired case (I am sure someone out there is very specific in their casing of the words and will be annoyed if we lower case them all)? Then there is the issue that some user really wants to have foo and Foo and FOO all in the same document.

jordanpadams commented 1 year ago

@al-niessner I'm not sure how we handle this... OpenSearch cares about case, and for PDS4, those labels are actually invalid. This should never happen. I think we should actually treat those as entirely separate attributes for the time being. We will need to figure out "synonyms" under the hood later in some other wrapper mechanism.

For handling case, in general, unless I am missing something, I don't think there is really a way to handle this in OpenSearch?

al-niessner commented 1 year ago

@jordanpadams

Just triple checked. No have to normalize the field names which is the hoary mess we keep discussing.

https://discuss.elastic.co/t/case-insensitive-search-on-fieldnames-not-values/87097

tloubrieu-jpl commented 1 year ago

This ticket is not an issue with the api but with the documents loaded in the registry.

miguelp1986 commented 7 months ago

@tloubrieu-jpl since the issue is not with the API, but with the documents themselves, should this be an i&t.skip issue?