Open cmacdonald opened 7 years ago
Removing that particular <element>
has no impact from a test parsing. This is a xoai created by a Dspace repository. I'm puzzled.
Could you create a minimum test case with the failing example (and the XML not retrieved from the site, so the example is stable)?
Could you also confirm if the XML is OAI valid?
Minimal test case - inserting inline as GH wont take an XML attachment.
As to its validity, I can confirm its created by a Dspace instance. Do you have an XOAI validator?
Will also update issue title.
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2017-07-31T20:10:46Z</responseDate>
<request verb="GetRecord" identifier="oai:repository.somewhere.ac.uk:10373/1861"
metadataPrefix="xoai">http://repository.somewhere.ac.uk/oai/request</request>
<GetRecord>
<record>
<header>
<identifier>oai:repository.somewhere.ac.uk:9999/999</identifier>
<datestamp>2015-02-03T17:41:40Z</datestamp>
<setSpec>com_10373_3</setSpec>
<setSpec>col_10373_12</setSpec>
</header>
<metadata>
<metadata xmlns="http://www.lyncode.com/xoai">
<element name="dc">
<!-- either this block -->
<element name="contributor">
<element name="author">
<element name="none">
<field name="value">Author1, First A.</field>
<field name="value">Author2, Second</field>
<field name="value">Author3, Third</field>
</element>
</element>
</element>
<!-- or this following commented block -->
<!--
<element name="relation">
<element name="ispartof">
<element name="en">
<field name="value">Another article 6(4)</field>
</element>
</element>
</element>
-->
</element>
</metadata>
</metadata>
</record>
</GetRecord>
</OAI-PMH>
Asking Google for "OAI validator" turns up quite a few hits. The only one I'm at all familiar with is OVAL: http://oval.base-search.net/
Thank you for that observation - I should have checked also.
I have now checked with the "offending" endpoint with http://oval.base-search.net/ and http://validator.oaipmh.com/. In particular, the latter produced no error for ListRecords, and the former produced an error about "No incremental harvesting (day granularity) of ListRecords", which I think would be irrelevant.
Output from a third validator can be found at http://oanet.cms.hu-berlin.de/validator/pages/validation_dini_results.xhtml?vid=ZUZaM2FscFM2NEpUY2lncHdZYno2QT09 - I don't feel qualified to ascertain the relevance of any of these to the Exception at hand.
I believe this is concerned with more than two levels of nesting <element>
tags in the Dspace generated xoai.
The problem is related to the underlying XmlReader, which consumes events without checking that they are not what was being requested. After some hacking, the simplest fix I could identify was just to check in the MetadataParser that the EOD had not been reached . If someone else is in agreement, I can add a test case, and make a pull request.
I had a long plane journey, so rewrote the traversal code underlying MetadataParser, which has a number of problems when parsing xoai:
My revised MetadataParser can be found at https://github.com/cmacdonald/xoai/commit/05f67f26bf8eb3c2eff60a076edc3c7189163c57
I have my own application code that I have with tested examples of OAI from Pure, Dspace and Eprints. I can make unit tests for xoai-serviceprovider.
Stacktrace as follows:
Minimum reproducible:
Example record at: view-source:http://repository.abertay.ac.uk/oai/request?verb=GetRecord&metadataPrefix=xoai&identifier=oai:repository.abertay.ac.uk:10373/1861
parseElement is failing at: parsing of
license
. Example iswhich contains the mime encoded contents of the license.
v4.2.1-SNAPSHOT cloned from git repo today.
Any ideas?