gdcc / xoai

OAI-PMH Java Toolkit
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Empty namespaces on attributes overwriting the default namespace #240

Closed bumann-sbb closed 1 month ago

bumann-sbb commented 1 month ago

We would like to use your library as an implementation for our OAI PMH dataprovider API. However we are running into a problem with the transformation during the GetRecord operation.

We have developed an XSLT script that transforms a TEI XML document into a Dublin Core XML document. When we apply this script to our XML document in an XML editor such as IntelliJ or Oxygen, the result is as expected. However, when using this library, we encounter issues related to namespaces.

In accordance with the TEI XML standard, the TEI Namespace is designated as the default namespace. Upon debugging the code for "GetRecord" and setting a breakpoint in the class "MetadataHelper.java:20," it became evident that the value of "metadata" was accurate. After executing line 20, a number of empty namespaces (specified as "xmlns="") were appended to a series of tags, including "name" and "ptr," as illustrated in the following example: (The following example has been shortened for the sake of brevity.)

<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader xml:lang="de">
      ...
      <publicationStmt>
        <publisher>
          <name xmlns="" type="org">Org</name>
          <ptr xmlns="" target="http://www.org.de"/>
        </publisher>
        ...
      </publicationStmt>
      ...
  </teiHeader>
</TEI>

This results in the failure of the transformation, and our Dublin Core XML is composed solely of empty tags. Upon debugging in metadata.write(writer), it ultimately reaches EchoElement.java:66. At this point, it can be observed that when the tag <name> is processed, the correct namespace (http://www.tei-c.org/ns/1.0) is mapped to it.

Once we get to the part where we're processing a tag with an attribute, we can take a peek at EchoElement.java:84. There, we'll see that the attribute is linked to an empty namespace. This means that the XML will have "xmlns=" written to it.

Upon commenting out EchoElement.java:85, the entire transformation functions as intended, yielding a valid Dublin Core XML. Given my limited expertise in XML, XML Namespaces, and XSLT, I am uncertain about the optimal solution that would not disrupt other use cases.

Should you require further information, kindly let me know and I will provide it.

pdurbin commented 1 month ago

@bumann-sbb thanks for your interest in using this library. Do you want to go ahead and make a pull request? Even if it isn't optimal, it could help move the conversation along.

bumann-sbb commented 1 month ago

@pdurbin I have submitted a pull request, since this is my first contribution on github, I hope I did follow the correct procedure.

pdurbin commented 1 month ago

@bumann-sbb thanks. Let's see what the test suite says. I just clicked "approve and run".

I've barely worked on this code but your pull request looks reasonable to me. Congrats on your first GitHub PR! 🎉

bumann-sbb commented 1 month ago

@pdurbin Thank you! I just saw that the build fails due to formatting problems. I will take care of this and push another commit in a few minutes.

pdurbin commented 1 month ago

Ah. Spotless. I clicked "approve and run" again.

poikilotherm commented 1 month ago

Version 5.2.1 was just released, should be on Central soon ( :crossed_fingers: )