NASA-PDS / harvest

Standalone Harvest client application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/nasa-pds/registry).
https://nasa-pds.github.io/registry
Other
4 stars 3 forks source link

ref_lid_* fields are not added to the Registry schema prior to load #127

Closed jordanpadams closed 4 months ago

jordanpadams commented 1 year ago

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

When I loaded data with a ref_lid_* not already in the registry, the fields are not searchable.

🕵️ Expected behavior

I expected the fields to be searchable

📜 To Reproduce

This query for ref_lid_target works because we include that in our initial schema creation:

https://pds.nasa.gov/api/search/1/products?q=ref_lid_target%20eq%20%22urn:nasa:pds:context:target:planet.mercury%22

This query for ref_lid_associate does not (but should): https://pds.nasa.gov/api/search/1/products?q=ref_lid_associate%20eq%20%22urn:nasa:pds:context:node:node.imaging%22

Same for ref_lid_data (and it should work): https://pds.nasa.gov/api/search/1/products?q=ref_lid_data%20eq%20%22urn:nasa:pds:messenger_mdis_4001:bdr_rdr:mdis_bdr_256ppd_h07nw2%22

🖥 Environment Info

📚 Version of Software Used

API v1.1, Harvest v3.7.6

🩺 Test Data / Additional context

No response

🦄 Related requirements

No response

⚙️ Engineering Details

No response

tloubrieu-jpl commented 7 months ago

Harvest need to dymanically update the list of reflid* fields which are searchable whenever they are found in the products. That should make the end of this configuration file https://github.com/NASA-PDS/registry-mgr/blob/main/src/main/resources/elastic/registry.json#L59 obsolete

al-niessner commented 7 months ago

@jordanpadams @tloubrieu-jpl

Just to be clear, we now want to scan every product for reflid* and add them to the index when found? When we say product do we mean all products or product not to include bundles or collections?

tloubrieu-jpl commented 7 months ago

Hi @al-niessner , We mean products as for any class of products including bundles and collections. And yes, we want all the reflid* found (created by harvest?) to be added to the schema before the product is loaded to opensearch.

al-niessner commented 7 months ago

@tloubrieu-jpl

Did a find/grep of all java files in harvest and there is no creation of ref_lid done there. Turns out it is done in registry-common but very, very far away from any connection to a DB. So, the simplest thing is to push the check back into harvest where it knows its connection and do it there. Doing all products makes it much simpler. Thanks.

tloubrieu-jpl commented 7 months ago

That sounds good, thanks @al-niessner