When reading DC XML, attributes on certain relatedItem properties will appear as hashes in DC JSON. This causes indexing errors in lupo because ES expects keywords for certain relatedItem properties and receives hashes.
Expected Behaviour
Only the content of relatedItem properties appears as a values in DC JSON.
This affects indexing in lupo, returning 500 errors for DOI metadata updates (ex. 10.4224/40002814) and causing certain DOIs not to appear in the index (ex. 10.4224/40002702). See this Sentry error for the former.
Proposal
Hypothesis
Possible Implementation
The code here will likely need to be changed to accommodate the possibility of attributes:
Describe the bug
When reading DC XML, attributes on certain relatedItem properties will appear as hashes in DC JSON. This causes indexing errors in lupo because ES expects keywords for certain relatedItem properties and receives hashes.
Expected Behaviour
Only the content of relatedItem properties appears as a values in DC JSON.
Steps to Reproduce
Read a DC XML relatedItem property like this:
<volume xml:lang="en">RR-175</volume>
It will appear like this in DC JSON:
"volume": { "lang": "en", "__content__": "RR-175" },
Context (Environment)
This affects indexing in lupo, returning 500 errors for DOI metadata updates (ex. 10.4224/40002814) and causing certain DOIs not to appear in the index (ex. 10.4224/40002702). See this Sentry error for the former.
Proposal
Hypothesis
Possible Implementation
The code here will likely need to be changed to accommodate the possibility of attributes:
https://github.com/datacite/bolognese/blob/master/lib/bolognese/readers/datacite_reader.rb#L197-L243
At some point, the XSD might be modified to exclude the possibility of attributes for these properties.
Front conversations