Closed eduardorep closed 2 weeks ago
@eduardorep hi! Do you have a URL we can harvest from to test this?
Sure thing, here you go mate: https://doaj.org/oai?verb=ListRecords&metadataPrefix=oai_dc&setSpec=TENDOkRlcm1hdG9sb2d5
Hi @eduardorep just to make sure I got this right: we are talking about the service provider here, using it to harvest a resource, aye?
I tried to read up on the other issues and it seems like you want this to parse just fine, right? Have you tried using our new service provider yet, as it already has an updated version of Woodstox, which might change things already?
Haven't tried yet because we were trying to understand if this would solve our issues. Since using this lib would bring breaking changes I was just trying to understand if this issue had been tackled explicitly. But it seems like your suggestion might be a viable option, thank you very much we will likely try it :) Have a nice one!
Please feel free to come back anytime! This is probably something that would affect Dataverse Installations round the world. Fixing this would definitely be in scope!
Hey, so we upgraded our XOAI to use this fork, so that we could test out whether the issue described in https://github.com/DSpace/xoai/issues/67 is happening, and I'm afraid it does.
For the records listed in the following response:
https://doaj.org/oai?verb=ListRecords&metadataPrefix=oai_dc&setSpec=TENDOkRlcm1hdG9sb2d5
Processing returns the error "The prefix xsi for attribute xsi schemaLocation associated with an element type oai_dc dc is not bound."
This issue seems to be caused by the fact that the namespace "xmlns:xsi" is only defined in the root OAI-PMH element, and not in each oai_dc:dc element.
While this issue is ultimately caused by a non-compliance of the OAI-PMH specification from DOAJs' part, it would be great if the XOAI parser was able to be configured to ignore namespace errors, or to add namespaces that were defined in the root element on any invalid nodes.
However, I believe this would be a complicated change, and would probably not be relevant for Dataverse. Do correct me if I'm wrong however :)
@eduardorep @jfeio are you aware of other systems besides DOAJ that are out of compliance with the spec in this way? I'm wondering how common of a problem this is.
Are either of you interested in creating a pull request? (If so, before you start, I'd like to hear what @landreev and @poikilotherm think.)
Yes there is another one, ScieloBR: https://github.com/DOAJ/doaj/issues/2186#issuecomment-1476402391
Their website: https://www.scielo.br/
An example of a list record from that repository: https://oaipmh.scielo.org/br/oai?verb=ListRecords&metadataPrefix=oai_dc
Hope this helps!
Hi @eduardorep and @jfeio !
I looked into this again today and put some thought into it. Dataverse does not always have this problem you describe, as we are not using the record parser in this project, but a custom one.
In the data provider, we had kind of a similar problem: we create some XML files already and wanted to "just include them" in the response. So maybe the same trick would be useful here, too? Would you benefit from using such a CopyElement
that would simply transfer the content inside <metadata></metadata>
unprocessed?
It would be part of the resulting Record
's Metadata
. From there, you could make it write to some String or whatever using an XmlWriter
.
In terms of configuration when to go this or the other way, the Context
you provide to the ServiceProvider
can hold the information about your choices here.
@eduardorep @jfeio please also feel free to join us on Zulip to discuss this less async. Here's an invite link, see you on the dev channel!
Hi @poikilotherm! We actually ended up creating a fork of DSpace/xoai, and we adapted the record parser so that it detects whether any given element contains the "xsi" property without declaring its namespace; if this is true, the parser adds the missing declaration to the offending element before validating it.
This solution is not as generic as the solution you are implementing, but for our purposes, it works fine ;)
Feel free to point me to your implementation or create a pull request. Always happy to add sth like this - the less forks to maintain the better.
@jfeio hi! I'm also curious about your implementation. Is the commit online?
@eduardorep @jfeio you are still welcome to point us to your commits, so we can add the fix here as well. Even better: create a PR!
For the time being, I'll close this. Feel free to reopen.
Hello, I'd like to know if by any chance you faced this issue https://github.com/DSpace/xoai/issues/67 and if so if you resolved it. or have any knowledge that might help us resolve it.