NASA-PDS / harvest

Standalone Harvest client application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/nasa-pds/registry).
https://nasa-pds.github.io/registry
Other
4 stars 3 forks source link

OpenSearch mapping conflict issue when trying to change a type (`[illegal_argument_exception]`) #204

Open tloubrieu-jpl opened 1 month ago

tloubrieu-jpl commented 1 month ago

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

Errors found in log: While loading: https://pds-geosciences.wustl.edu/insight/urn-nasa-pds-insight_seis/data/xb/continuous_waveform/elyhk/2019/068/xb.elyhk.19.uk1.2019.068.6.a.xml

[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]

🕵️ Expected behavior

I expected the product to load without error.

📜 To Reproduce

Load the selected product in production.

🖥 Environment Info

No response

📚 Version of Software Used

harvest 4.0.2

🩺 Test Data / Additional context

See full log section provided by GEO node (Dan Scholes):

[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1860.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.19.uk1.2019.218.3.a.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1850.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.19.uk1.2019.218.3.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1860.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.20.uk2.2019.218.3.a.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1850.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.20.uk2.2019.218.3.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1860.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.21.uea.2019.218.3.a.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1850.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.21.uea.2019.218.3.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1860.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.22.uk2.2019.218.3.a.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

al-niessner commented 1 month ago

@tloubrieu-jpl

Not sure what the response is here. It seems the opensearch mapping already thinks insight:Observation_Information/insight:release_number is a keyword. Once a mapping has been set it cannot be changed without a remap, hence the error. Did you want harvest to remember who set insight:Observation_Information/insight:release_number to a keyword? Did you want harvest to do a remap?

tloubrieu-jpl commented 1 month ago

Hi @al-niessner ,

I guess the LDD was not available first the default keyword type was then assigned to the property.

That is a case which should occur frequently, but I believe sweeper or a central service of some sort should update the mapping and do the remap.

This is not preventing harvest from loading the product anyway ?

al-niessner commented 1 month ago

@tloubrieu-jpl

No idea about how or why it started as a keyword; I thought strings were the default with keywords for select items that have id in them but maybe not. As the mapping fills out, re-mappings should only result from changes in the PDS schema.

I do not know if there is a sweeper that does the remapping. Somebody used to do it by hand but it may have made it to a sweeper.

harvest cannot ingest material into opensearch if the types do not match. In this case, could probably get away with it, but, in general, cannot. The most pathological case is revision_number is an integer. Then the choice is made to move to semantic versioning so it becomes a string. There is no way to then push 1.2.3 into a field that opensearch thinks is an integer. Therefore, remapping has to take place before the document can be ingested.

tloubrieu-jpl commented 4 weeks ago

To Be continued, this issue requires a serious design. Options are:

alexdunnjpl commented 1 week ago

It seems like in the revision_number case, that a move from int to semver would justify using a differently-named field? If it's a problem elsewhere though, agreed.

From what @al-niessner is saying, there is no way to cast arbitrary field value to string-likes.

Changing a mapping is at this point (AOSS), a significant undertaking, requiring an ad-hoc migration of all documents since the reindex operation is not available. So whatever behaviour we implement should bias heavily toward ensuring that mappings only need to be altered when absolutely necessary (i.e. prefer not creating a mapping over creating a mapping with a default value).

So if harvest is applying defaults, that should be changed and the field be omitted from the mappings submitted to OpenSearch, and the user be made aware that someField will not be searchable until it is available in a published DD (at which point the reindexing sweeper will handle fixing that document).

al-niessner commented 1 week ago

I do not think it will add documents already in an index that 'field' when 'field' is added. In that case, it will still require a re-index.

I concur - ditch harvest default. Quit until LDD is updated even if we read a local LDD because that puts it on the user side.

tloubrieu-jpl commented 6 days ago

I see your point @al-niessner , but I would wait for @jordanpadams feedback on that because if we go that route, I agree that makes things simpler on our side, but that will also slow the intergration of products in our registry.

So it depends if we want :

  1. as much product as we can although not fully ready on the user side (and then not as searchable, more dev work to handle this cases)
  2. wait for the products to be fully validated.
tloubrieu-jpl commented 1 day ago

@al-niessner, as a conclusion on this ticket, we need to make sure harvest does not assign default mapping type for fields which have no LDD because it very complicated to change the type of an indexed field in the opensearch we use.

The product with unknown type fields should be loaded but the unknown type fields must not be added to the mapping.

al-niessner commented 10 hours ago

@tloubrieu-jpl

Do you want the message telling the user that it will not be searchable as an error, or a warning? Nobody reads warnings or below until it is too late, but for harvest it is clearly a warning while for the DB behavior it is clearly an error.