DSpace / xoai

OAI-PMH Java Toolkit
29 stars 52 forks source link

NoSuchElementException when parsing xoai #66

Open cmacdonald opened 7 years ago

cmacdonald commented 7 years ago

Stacktrace as follows:

Exception in thread "main" java.util.NoSuchElementException
    at org.codehaus.stax2.ri.Stax2EventReaderImpl.throwEndOfInput(Stax2EventReaderImpl.java:453)
    at org.codehaus.stax2.ri.Stax2EventReaderImpl.nextEvent(Stax2EventReaderImpl.java:242)
    at com.lyncode.xml.XmlReader.next(XmlReader.java:134)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:43)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
    at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parse(MetadataParser.java:34)
    at org.dspace.xoai.serviceprovider.parsers.RecordParser.parse(RecordParser.java:56)
    at org.dspace.xoai.serviceprovider.parsers.ListRecordsParser.next(ListRecordsParser.java:60)
    at org.dspace.xoai.serviceprovider.handler.ListRecordHandler.nextIteration(ListRecordHandler.java:71)
    at org.dspace.xoai.serviceprovider.lazy.ItemIterator.hasNext(ItemIterator.java:32)
    at org.dspace.xoai.serviceprovider.lazy.ItemIterator.<init>(ItemIterator.java:22)
    at org.dspace.xoai.serviceprovider.ServiceProvider.listRecords(ServiceProvider.java:57)

Minimum reproducible:

OAIClient oaiClient = new HttpOAIClient("http://repository.abertay.ac.uk/oai/request");
context.withOAIClient(oaiClient);
ServiceProvider ssoarOaiPmhEndpoint = new ServiceProvider(context);
ListRecordsParameters parameters = new ListRecordsParameters();
parameters.withMetadataPrefix("xoai");
ssoarOaiPmhEndpoint.listRecords(parameters);

Example record at: view-source:http://repository.abertay.ac.uk/oai/request?verb=GetRecord&metadataPrefix=xoai&identifier=oai:repository.abertay.ac.uk:10373/1861

parseElement is failing at: parsing of license. Example is

<element name="license"><field name="bin">Tk9URTogVGhpcyBpcyB0aGUgZGVmYXVsdCBsaWNlbmNlIHRoYXQgdGhlIFVuaXZlcnNpdHkgb2YgQWJlcnRheSAKRHVuZGVlIHJlcXVpcmVzIGFsbCBzdWJtaXR0ZXJzIHRvIGdyYW50LgoKTk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5DRQoKQnkgYWdyZWVpbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbmNlLCB5b3UgKHRoZSBhdXRob3IocyksIApjb3B5cmlnaHQgb3duZXIgb3Igbm9taW5hdGVkIGFnZW50KSBncmFudHMgdG8gVW5pdmVyc2l0eSBvZiBBYmVydGF5IApEdW5kZWUgKFVBRCkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLCB0cmFuc2xhdGUgCihhcyBkZWZpbmVkIGJlbG93KSwgYW5kL29yIGRpc3RyaWJ1dGUgeW91ciBzdWJtaXNzaW9uIChpbmNsdWRpbmcgdGhlIAphYnN0cmFjdCkgd29ybGR3aWRlIGluIHByaW50IGFuZCBlbGVjdHJvbmljIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bSwgCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBVQUQgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4gCllvdSBhbHNvIGFncmVlIHRoYXQgVUFEIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIApzdWJtaXNzaW9uIGZvciBwdXJwb3NlcyBvZiBzZWN1cml0eSwgYmFjay11cCBhbmQgcHJlc2VydmF0aW9uLgoKWW91IHJlcHJlc2VudCB0aGF0IHRoZSBzdWJtaXNzaW9uIGlzIG9yaWdpbmFsIHdvcmssIGFuZCB0aGF0IHlvdQpoYXZlIHRoZSByaWdodCB0byBncmFudCB0aGUgcmlnaHRzIGNvbnRhaW5lZCBpbiB0aGlzIGxpY2VuY2UuIFlvdSAKYWxzbyByZXByZXNlbnQgdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIgCmtub3dsZWRnZSwgaW5mcmluZ2UgdXBvbiBhbnlvbmUncyBjb3B5cmlnaHQuCgpJZiB0aGUgc3VibWlzc2lvbiBjb250YWlucyBtYXRlcmlhbCBmb3Igd2hpY2ggeW91IG9yIHlvdXIgcHVibGlzaGVyCmRvIG5vdCBob2xkIGNvcHlyaWdodCwgeW91IHJlcHJlc2VudCB0aGF0IHlvdSBoYXZlIG9idGFpbmVkIHRoZQp1bnJlc3RyaWN0ZWQgcGVybWlzc2lvbiBvZiB0aGUgY29weXJpZ2h0IG93bmVyIHRvIGdyYW50IFVBRCB0aGUKcmlnaHRzIHJlcXVpcmVkIGJ5IHRoaXMgbGljZW5jZSwgYW5kIHRoYXQgc3VjaCB0aGlyZC1wYXJ0eSBvd25lZAptYXRlcmlhbCBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIGFja25vd2xlZGdlZCB3aXRoaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgc3VibWlzc2lvbi4KCklGIFRIRSBTVUJNSVNTSU9OIElTIEJBU0VEIFVQT04gV09SSyBUSEFUIEhBUyBCRUVOIFNQT05TT1JFRCBPUiAKU1VQUE9SVEVEIEJZIEFOIEFHRU5DWSBPUiBPUkdBTklaQVRJT04gT1RIRVIgVEhBTiBVQUQsIFlPVSBSRVBSRVNFTlQgClRIQVQgWU9VIEhBVkUgRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgClJFUVVJUkVEIEJZIFNVQ0ggQ09OVFJBQ1QgT1IgQUdSRUVNRU5ULgoKVUFEIHdpbGwgY2xlYXJseSBpZGVudGlmeSB5b3VyIG5hbWUocykgYXMgdGhlIGF1dGhvcihzKSBvciBvd25lcihzKSAKb2YgdGhlIHN1Ym1pc3Npb24sIGFuZCB3aWxsIG5vdCBtYWtlIGFueSBhbHRlcmF0aW9uLCBvdGhlciB0aGFuIGFzIAphbGxvd2VkIGJ5IHRoaXMgbGljZW5jZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=</field>
</element>

which contains the mime encoded contents of the license.

v4.2.1-SNAPSHOT cloned from git repo today.

Any ideas?

cmacdonald commented 7 years ago

Removing that particular <element> has no impact from a test parsing. This is a xoai created by a Dspace repository. I'm puzzled.

mmalmeida commented 7 years ago

Could you create a minimum test case with the failing example (and the XML not retrieved from the site, so the example is stable)?

Could you also confirm if the XML is OAI valid?

cmacdonald commented 7 years ago

Minimal test case - inserting inline as GH wont take an XML attachment.

As to its validity, I can confirm its created by a Dspace instance. Do you have an XOAI validator?

Will also update issue title.

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
    <responseDate>2017-07-31T20:10:46Z</responseDate>
    <request verb="GetRecord" identifier="oai:repository.somewhere.ac.uk:10373/1861"
        metadataPrefix="xoai">http://repository.somewhere.ac.uk/oai/request</request>
    <GetRecord>
        <record>
            <header>
                <identifier>oai:repository.somewhere.ac.uk:9999/999</identifier>
                <datestamp>2015-02-03T17:41:40Z</datestamp>
                <setSpec>com_10373_3</setSpec>
                <setSpec>col_10373_12</setSpec>
            </header>
            <metadata>
                <metadata xmlns="http://www.lyncode.com/xoai">
                    <element name="dc">
                        <!--  either this block -->
                        <element name="contributor">
                            <element name="author">
                                <element name="none">
                                    <field name="value">Author1, First A.</field>
                                    <field name="value">Author2, Second</field>
                                    <field name="value">Author3, Third</field>
                                </element>
                            </element>
                        </element>
                        <!--  or this following commented block -->
                        <!--  
                        <element name="relation">
                            <element name="ispartof">
                                <element name="en">
                                    <field name="value">Another article 6(4)</field>
                                </element>
                            </element>
                        </element>
                         -->
                    </element>
                </metadata>
            </metadata>
        </record>
    </GetRecord>
</OAI-PMH>
mwoodiupui commented 7 years ago

Asking Google for "OAI validator" turns up quite a few hits. The only one I'm at all familiar with is OVAL: http://oval.base-search.net/

cmacdonald commented 7 years ago

Thank you for that observation - I should have checked also.

I have now checked with the "offending" endpoint with http://oval.base-search.net/ and http://validator.oaipmh.com/. In particular, the latter produced no error for ListRecords, and the former produced an error about "No incremental harvesting (day granularity) of ListRecords", which I think would be irrelevant.

Output from a third validator can be found at http://oanet.cms.hu-berlin.de/validator/pages/validation_dini_results.xhtml?vid=ZUZaM2FscFM2NEpUY2lncHdZYno2QT09 - I don't feel qualified to ascertain the relevance of any of these to the Exception at hand.

cmacdonald commented 7 years ago

I believe this is concerned with more than two levels of nesting <element> tags in the Dspace generated xoai.

cmacdonald commented 7 years ago

The problem is related to the underlying XmlReader, which consumes events without checking that they are not what was being requested. After some hacking, the simplest fix I could identify was just to check in the MetadataParser that the EOD had not been reached . If someone else is in agreement, I can add a test case, and make a pull request.

MetadataParser.diff.txt

cmacdonald commented 7 years ago

I had a long plane journey, so rewrote the traversal code underlying MetadataParser, which has a number of problems when parsing xoai:

My revised MetadataParser can be found at https://github.com/cmacdonald/xoai/commit/05f67f26bf8eb3c2eff60a076edc3c7189163c57

I have my own application code that I have with tested examples of OAI from Pure, Dspace and Eprints. I can make unit tests for xoai-serviceprovider.