DSpace / DSpace

(Official) The DSpace digital asset management system that powers your Institutional Repository
https://wiki.lyrasis.org/display/DSDOC8x/
BSD 3-Clause "New" or "Revised" License
905 stars 1.32k forks source link

DOI Organizer: make it easier to identify offending item + what's wrong with the item #9827

Open bram-atmire opened 1 month ago

bram-atmire commented 1 month ago

Describe the bug

When the DOI Organizer errors out, a typical error signature looks like:

2024-09-13 06:10:45,583 WARN  unknown unknown org.dspace.identifier.doi.DataCiteConnector @ While reserving the DOI doi:10.13025/14703, we got a http status code 422 and the message "DOI 10.13025/14703: This element is not expected. Expected is one of ( {[http://datacite.org/schema/kernel-4}creators](http://datacite.org/schema/kernel-4%7Dcreators), {[http://datacite.org/schema/kernel-4}titles](http://datacite.org/schema/kernel-4%7Dtitles), {[http://datacite.org/schema/kernel-4}publisher](http://datacite.org/schema/kernel-4%7Dpublisher), {[http://datacite.org/schema/kernel-4}publicationYear](http://datacite.org/schema/kernel-4%7DpublicationYear), {[http://datacite.org/schema/kernel-4}resourceType](http://datacite.org/schema/kernel-4%7DresourceType), {[http://datacite.org/schema/kernel-4}subjects](http://datacite.org/schema/kernel-4%7Dsubjects), {[http://datacite.org/schema/kernel-4}contributors](http://datacite.org/schema/kernel-4%7Dcontributors), {[http://datacite.org/schema/kernel-4}dates](http://datacite.org/schema/kernel-4%7Ddates), {[http://datacite.org/schema/kernel-4}language](http://datacite.org/schema/kernel-4%7Dlanguage), {[http://datacite.org/schema/kernel-4}alternateIdentifiers](http://datacite.org/schema/kernel-4%7DalternateIdentifiers) ). at line 4, column 0".
2024-09-13 06:10:47,664 ERROR unknown unknown org.dspace.identifier.doi.DOIOrganiser @ It wasn't possible to update this identifier:  doi:10.13025/14703 Exceptions code:  BAD_ANSWER
org.dspace.identifier.doi.DOIIdentifierException: Unable to parse an answer from DataCite API. Please have a look into DSpace logs.
    at org.dspace.identifier.doi.DataCiteConnector.reserveDOI(DataCiteConnector.java:467) ~[dspace-api-7.6.jar:7.6]
    at org.dspace.identifier.doi.DataCiteConnector.updateMetadata(DataCiteConnector.java:538) ~[dspace-api-7.6.jar:7.6]
    at org.dspace.identifier.doi.DOIOrganiser.update(DOIOrganiser.java:571) [dspace-api-7.6.jar:7.6]
    at org.dspace.identifier.doi.DOIOrganiser.runCLI(DOIOrganiser.java:271) [dspace-api-7.6.jar:7.6]
    at org.dspace.identifier.doi.DOIOrganiser.main(DOIOrganiser.java:103) [dspace-api-7.6.jar:7.6]

There are two problems with this:

  1. If the DOI is not yet on a live item, it is not super easy to identify the offending item from doi:10.13025/14703, as it requires a lookup/inspection into the DOI table. Would be great if the item uuid would be logged alongside the error.

  2. The error message doesn't make it clear what's wrong with the metadata of the item. Cases that we have seen that is causing errors are null/empty metadata values for a specific field, or the DOI being present twice, but there are other cases as well.

Would be really great if the error statement makes it clear which metadata field or value is causing the problem.

To Reproduce

Steps to reproduce the behavior:

  1. Make sure DOI registration is configured and active for new items
  2. Submit an item with an empty value for for example dc.contributor.author, or put the same DOI value in two separate instances of dc.identifier.uri

Expected behavior

The offending item and offending metadata field should be clear from the log, so that the errors are easily resolved.

mwoodiupui commented 1 month ago

The cited error response is mostly the error message that the registrar's service got from some schema-driven XML parser. We shouldn't depend on knowing or guessing what parser they use today, so it is risky to do more than to display the document we sent together with their response.

We could acquire a copy of the schema that we intend to follow in our crosswalk, and parse the crosswalk output with it to check validity before sending. That might give us a chance to display better information about any syntactic problems.

mwoodiupui commented 1 month ago

Second thought: rather than generating a schema-driven parser just to check for problems, simply log the metadata document as part of the error message and let people use their preferred tools to examine it.