dcmi / pids_in_dc

Expressing PIDs in Dublin Core
6 stars 1 forks source link

Multiple values in an attribute #18

Closed tombaker closed 2 years ago

tombaker commented 2 years ago

In 2018, the discussion of the PID proposal raised the possibility that a PID attribute could have multiple values.

However, I happened to stumble across the assertion, in an XML tutorial, that attributes "cannot contain multiple values", while child elements can.

Is this true about XML attributes in general?

jesusbagpuss commented 2 years ago

TL;DR: I think the proposal is workable, although more ardent XML experts might cringe at it. I think the approach is lightweight and pragmatic, and importantly shouldn't break existing integrations.

I'm guessing this was the tutorial you found: https://beginnersbook.com/2018/10/xml-attributes/

As mentioned in the tutorial, you can't have this (repeated attributes): <element attr="foo" attr="bar">, but you can have this: <element attr="foo bar">. but the attr has a value of 'foo bar' rather than 'foo' and 'bar' - I think this is what the tutorial means - the attribute has one value.

If we're representing PIDs as URLs - which can't contain (unencoded) whitespace, I think we could operate a 'pids' attribute like a series of whitespace separated words.

As we're trying to coerce PIDs into the existing DC elements with as little change as possible, then I think a pids="X Y" attribute is workable:

If we were creating a new metadata schema (or writing a full extension to an existing one) then a more usual representation e.g. <pids><pid>X</pid><pid>Y</pid></pids> would probably be the right approach.

I think treating attributes as tokens, split on a specific character isn't that uncommon a practice in XML. In HTML, the 'class' attribute makes use of this - so much so it now has it's own interface to add/remove classes.

I've lashed together a test using XSLT v1.0, processed with xsltproc, which has no issues dealing with multiple whitespace-separated tokens.

Example XSLT (based on OAI-PMH, just because it was close to hand):

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="text()"/>

  <xsl:template match="oai_dc:dc"  xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" >
    <div class="dcdata">
      <h3>Dublin Core Metadata (oai_dc)</h3>
      <table class="dcdata">
        <xsl:apply-templates select="*" />
      </table>
    </div>
  </xsl:template>

  <xsl:template match="dc:creator" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <tr>
      <td class="key">Author or Creator</td>
      <td class="value">
        <xsl:value-of select="."/>
        <xsl:call-template name="split-pids">
          <xsl:with-param name="text" select="normalize-space(./@pids)" />
        </xsl:call-template>
      </td>
    </tr>
  </xsl:template>
  <xsl:template name="split-pids">
    <xsl:param name="text"/>
    <xsl:param name="delimiter" select="' '"/>
    <xsl:param name="output-delimiter" select="'&#10;'"/>

    <xsl:if test="$text != ''">
      <xsl:value-of select="$output-delimiter"/>
      <xsl:choose>
        <xsl:when test="contains($text, $delimiter)">

          <xsl:call-template name="link-pid">
            <xsl:with-param name="pid" select="substring-before($text, $delimiter)" />
          </xsl:call-template>
          <!-- recursive call -->
          <xsl:call-template name="split-pids">
            <xsl:with-param name="text" select="substring-after($text, $delimiter)" />
          </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
          <xsl:call-template name="link-pid">
            <xsl:with-param name="pid" select="$text" />
          </xsl:call-template>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:if>
  </xsl:template>

  <xsl:template name="link-pid">
    <xsl:param name="pid"/>

    <xsl:element name="a">
      <xsl:attribute name="href">
        <xsl:value-of select="$pid"/>
      </xsl:attribute>
      <xsl:value-of select="$pid"/>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

Example XML

<?xml version='1.0' encoding='UTF-8'?>
<metadata>
  <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <dc:title>Geochemical Sources and Availability of Amidophosphates on the Early Earth</dc:title>
    <dc:creator pids="https://orcid.org/0000-1234-1234-5432 https://isni.org/ABCD-ZYZ">Gibard, C</dc:creator>
    <dc:creator pids=" ">Gorrell, IB</dc:creator>
    <dc:creator pids="">Jiménez, EI</dc:creator>
    <dc:creator pids="https://orcid.org/0000-0002-2553-766X">Kee, TP</dc:creator>
    <dc:creator pids="https://orcid.org/0000-9999-9999-9875 ">Pasek, MA</dc:creator>
    <dc:creator>Krishnamurthy, R</dc:creator>
    <dc:description>Phosphorylation of (pre)biotically relevant molecules in aqueous medium ... &lt;snip&gt; ... aqueous environments on early earth.</dc:description>
    <dc:publisher>Wiley</dc:publisher>
    <dc:date>2019-06-11</dc:date>
    <dc:type>Article</dc:type>
    <dc:type>NonPeerReviewed</dc:type>
    <dc:format>text</dc:format>
    <dc:identifier>https://eprints.whiterose.ac.uk/145174/1/Gibard_et_al-2019-Angewandte_Chemie_International_Edition.pdf</dc:identifier>
    <dc:identifier>   Gibard, C, Gorrell, IB, Jiménez, EI et al. (3 more authors)  (2019) Geochemical Sources and Availability of Amidophosphates on the Early Earth.  Angewandte Chemie International Edition, 58 (24).    pp. 8151-8155.  ISSN 1433-7851     </dc:identifier>
    <dc:relation>https://eprints.whiterose.ac.uk/145174/</dc:relation></oai_dc:dc>
</metadata>

Example output:

<?xml version="1.0"?>
<div xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" class="dcdata">
  <h3>Dublin Core Metadata (oai_dc)</h3>
  <table class="dcdata">
    <tr xmlns:dc="http://purl.org/dc/elements/1.1/">
      <td class="key">Author or Creator</td>
      <td class="value">Gibard, C
<a href="https://orcid.org/0000-1234-1234-5432">https://orcid.org/0000-1234-1234-5432</a>
<a href="https://isni.org/ABCD-ZYZ">https://isni.org/ABCD-ZYZ</a></td>
    </tr>
    <tr xmlns:dc="http://purl.org/dc/elements/1.1/">
      <td class="key">Author or Creator</td>
      <td class="value">Gorrell, IB</td>
    </tr>
    <tr xmlns:dc="http://purl.org/dc/elements/1.1/">
      <td class="key">Author or Creator</td>
      <td class="value">Jiménez, EI</td>
    </tr>
    <tr xmlns:dc="http://purl.org/dc/elements/1.1/">
      <td class="key">Author or Creator</td>
      <td class="value">Kee, TP
<a href="https://orcid.org/0000-0002-2553-766X">https://orcid.org/0000-0002-2553-766X</a></td>
    </tr>
    <tr xmlns:dc="http://purl.org/dc/elements/1.1/">
      <td class="key">Author or Creator</td>
      <td class="value">Pasek, MA
<a href="https://orcid.org/0000-9999-9999-9875">https://orcid.org/0000-9999-9999-9875</a></td>
    </tr>
    <tr xmlns:dc="http://purl.org/dc/elements/1.1/">
      <td class="key">Author or Creator</td>
      <td class="value">Krishnamurthy, R</td>
    </tr>
  </table>
</div>
tombaker commented 2 years ago

@jesusbagpuss

I'm guessing this was the tutorial you found: https://beginnersbook.com/2018/10/xml-attributes/

I pasted the wrong link above - what I had stumbled across was https://www.w3schools.com/xml/xml_dtd_el_vs_attr.asp which, I see, refers specifically to attributes in DTDs, not in XSDs.

I think treating attributes as tokens, split on a specific character isn't that uncommon a practice in XML.

Thank you for the explanation!

As we have established that including multiple values in an attribute, to be split on a specific character, is not illegal and, on the contrary, is actually practiced, I propose that we close this issue.

tombaker commented 2 years ago

Closing...