Open vchendrix opened 4 months ago
Thanks for the report, @vchendrix . This could be related to #2167 and certainly seems to be in the same category of character encoding problems. Like that bug, our error handling pipeline in MetacatUI seems to miss that metacat produces an error and silently moves on. This has been a common thread and involves data loss, so I am going to label this as critical. I will discuss this with @robyngit and @rushirajnenuji to try to figure out a path forward. Thanks.
Thanks for the report, @vchendrix . This could be related to #2167 and certainly seems to be in the same category of character encoding problems. Like that bug, our error handling pipeline in MetacatUI seems to miss that metacat produces an error and silently moves on. This has been a common thread and involves data loss, so I am going to label this as critical. I will discuss this with @robyngit and @rushirajnenuji to try to figure out a path forward. Thanks.
No problem. The solution will probably be the same in MetacatUi. The only noticeable difference is that in this case Metacat accepts the update but fails to parse the EML for the solr index which was very difficult to remedy. In #2167 Metacat rejects the update thus making it easier to recover.
@vchendrix could you attachto this ticket the original EML document that triggers this SOLR indexing error? It would be very helpful to be able to reproduce what you mean by "invalid characters" with a concrete reproducible example.
what you mean by "invalid characters" with a concrete reproducible example.
Here is the URL: https://data.ess-dive.lbl.gov/catalog/d1/mn/v2/object/ess-dive-3619bd077a60b7c-20240624T120319367
The_importance_of_accounting_for_landscape.xml
@mbjones NOTE that once opened up in an editor the characters are automatically encoded and I was able to upload and have it parse successfully. The characters were garbage but it sidestepped the error. The invalid characters, I suspect, are in Step 7 of the Methods.
Description The Metacat UI Editor allowed invalid characters to be saved in metadata. When the Metacat indexer tried to process the metadata file, the following error was encountered:
The result was that the dataset metadata was not indexed in Solr. However, the resource map was created successfully, rendering the dataset uneditable. The metadata in Solr looked as follows:
Steps to Reproduce
Expected behavior The metadata should be properly encoded as UTF-8 before being saved, ensuring that it can be indexed without errors.
Screenshots
Additional context We recovered from this by using the API directly to upload a new metadata file that is parseable by the Metacat indexer and then manually create the resource map. This fixed the issue enough to allow the dataset to be edited and published. However, the previous version is in a state where it will never be properly indexed. The Metacat UI metadata editor should ensure that the metadata is encoded properly as UTF-8.