NASA-PDS / harvest

Standalone Harvest client application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/nasa-pds/registry).
https://nasa-pds.github.io/registry
Other
4 stars 3 forks source link

I want to update the OpenSearch schema whatever the number of fields to be updated #190

Closed tloubrieu-jpl closed 1 month ago

tloubrieu-jpl commented 1 month ago

Checked for duplicates

No - I haven't checked

πŸ› Describe the bug

From Dan Scholes (GEO node) When I did harvest one bundle I got the error message:

[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] Limit of total fields [1000] has been exceeded

After this error the bundle that I am trying to load to the registry is not loaded, see log:

[SUMMARY] Reading configuration from E:\opt\configs24b\urn-nasa-pds-a17fuvs.xml
[SUMMARY] Output directory: /tmp/harvest/out

Except for the first time it tried:

[ERROR] Request failed: [illegal_argument_exception] Limit of total fields [1000] has been exceeded
[INFO] Wrote 4454 product(s)
[SUMMARY] Summary:
[SUMMARY] Skipped files: 0
[SUMMARY] Loaded files: 4454
[SUMMARY]   Product_Bundle: 1
[SUMMARY]   Product_Collection: 6
[SUMMARY]   Product_Observational: 4447
[SUMMARY] Failed files: 4121
[SUMMARY] Package ID: 106a3013-01d8-4cad-a71c-76ec8f250099
[SUMMARY] Reading configuration from E:\opt\configs24b\clem1-gravity-topo-v1.xml
[SUMMARY] Output directory: /tmp/harvest/out

Full log: processingLogsBatch2.txt.zip

πŸ•΅οΈ Expected behavior

I expected not to have limitation in the updade of the Opensearch schema when I use harvest.

πŸ“œ To Reproduce

No response

πŸ–₯ Environment Info

πŸ“š Version of Software Used

4.0.1

🩺 Test Data / Additional context

No response

πŸ¦„ Related requirements

πŸ¦„ #xyz

βš™οΈ Engineering Details

No response

πŸŽ‰ Integration & Test

No response

jordanpadams commented 1 month ago

@tloubrieu-jpl is there a workaround for this or is this a critical bug?

al-niessner commented 1 month ago

@tloubrieu-jpl

Can I get the bundle and harvest config file? How big (bytes) is it including folders etc?

Searched the java code (harvest, registry-mgr, and registry-common) and there is no 'total fields'. At this time, it means it is a serverless limit not ours. We may need to introduce paging of writes were it was not needed previously.

tloubrieu-jpl commented 1 month ago

@al-niessner yes I believe this is opensearch serverless limit, we might have to update schema in multiple chucks if that is possible.

jordanpadams commented 1 month ago

Standup status: Started working this AM. LDD loading is not working correctly, but on the hunt.

tloubrieu-jpl commented 1 month ago

Sorry for the late response, this is indeed critical, I am not thinking of a work around.

tloubrieu-jpl commented 1 month ago

@sjoshi-jpl @al-niessner could the error of this ticket be related to that configuration https://www.elastic.co/guide/en/elasticsearch/reference/master/mapping-settings-limit.html

I remember in the early design of the registry that we have been warned about using too many fields but we did not have issues with that. It feels to me like this configured limit is a safety net to prevent us from having too many fields in our index. But we would need to increase it although I am not sure what the new value should be. 5000 ?

@jordanpadams would you know how many properties are defined in pds4 ?

tloubrieu-jpl commented 1 month ago

As part of this fix, I also ran this query on our opensearch server:

PUT /*-registry/_settings 
{ "index.mapping.total_fields.limit": 3000 } 

This parameter was set in the previous version of our opensearch configuration as shown here: https://github.com/NASA-PDS/registry-mgr/blob/881b696c5a1ab84c6d2fd431a2823555b1d322f9/src/main/resources/elastic/registry.json#L4