Open tloubrieu-jpl opened 1 month ago
@tloubrieu-jpl
Not sure what the response is here. It seems the opensearch mapping already thinks insight:Observation_Information/insight:release_number
is a keyword. Once a mapping has been set it cannot be changed without a remap, hence the error. Did you want harvest to remember who set insight:Observation_Information/insight:release_number
to a keyword? Did you want harvest to do a remap?
Hi @al-niessner ,
I guess the LDD was not available first the default keyword type was then assigned to the property.
That is a case which should occur frequently, but I believe sweeper or a central service of some sort should update the mapping and do the remap.
This is not preventing harvest from loading the product anyway ?
@tloubrieu-jpl
No idea about how or why it started as a keyword; I thought strings were the default with keywords for select items that have id in them but maybe not. As the mapping fills out, re-mappings should only result from changes in the PDS schema.
I do not know if there is a sweeper that does the remapping. Somebody used to do it by hand but it may have made it to a sweeper.
harvest cannot ingest material into opensearch if the types do not match. In this case, could probably get away with it, but, in general, cannot. The most pathological case is revision_number
is an integer. Then the choice is made to move to semantic versioning so it becomes a string. There is no way to then push 1.2.3 into a field that opensearch thinks is an integer. Therefore, remapping has to take place before the document can be ingested.
To Be continued, this issue requires a serious design. Options are:
It seems like in the revision_number case, that a move from int to semver would justify using a differently-named field? If it's a problem elsewhere though, agreed.
From what @al-niessner is saying, there is no way to cast arbitrary field value to string-likes.
Changing a mapping is at this point (AOSS), a significant undertaking, requiring an ad-hoc migration of all documents since the reindex operation is not available. So whatever behaviour we implement should bias heavily toward ensuring that mappings only need to be altered when absolutely necessary (i.e. prefer not creating a mapping over creating a mapping with a default value).
So if harvest is applying defaults, that should be changed and the field be omitted from the mappings submitted to OpenSearch, and the user be made aware that someField
will not be searchable until it is available in a published DD (at which point the reindexing sweeper will handle fixing that document).
I do not think it will add documents already in an index that 'field' when 'field' is added. In that case, it will still require a re-index.
I concur - ditch harvest default. Quit until LDD is updated even if we read a local LDD because that puts it on the user side.
I see your point @al-niessner , but I would wait for @jordanpadams feedback on that because if we go that route, I agree that makes things simpler on our side, but that will also slow the intergration of products in our registry.
So it depends if we want :
@al-niessner, as a conclusion on this ticket, we need to make sure harvest does not assign default mapping type for fields which have no LDD because it very complicated to change the type of an indexed field in the opensearch we use.
The product with unknown type fields should be loaded but the unknown type fields must not be added to the mapping.
@tloubrieu-jpl
Do you want the message telling the user that it will not be searchable as an error, or a warning? Nobody reads warnings or below until it is too late, but for harvest it is clearly a warning while for the DB behavior it is clearly an error.
Checked for duplicates
No - I haven't checked
🐛 Describe the bug
Errors found in log: While loading: https://pds-geosciences.wustl.edu/insight/urn-nasa-pds-insight_seis/data/xb/continuous_waveform/elyhk/2019/068/xb.elyhk.19.uk1.2019.068.6.a.xml
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
🕵️ Expected behavior
I expected the product to load without error.
📜 To Reproduce
Load the selected product in production.
🖥 Environment Info
No response
📚 Version of Software Used
harvest 4.0.2
🩺 Test Data / Additional context
See full log section provided by GEO node (Dan Scholes):
🦄 Related requirements
🦄 #xyz
⚙️ Engineering Details
No response
🎉 Integration & Test
No response