Closed tombaker closed 2 years ago
TL;DR: I think the proposal is workable, although more ardent XML experts might cringe at it. I think the approach is lightweight and pragmatic, and importantly shouldn't break existing integrations.
I'm guessing this was the tutorial you found: https://beginnersbook.com/2018/10/xml-attributes/
As mentioned in the tutorial, you can't have this (repeated attributes):
<element attr="foo" attr="bar">
,
but you can have this:
<element attr="foo bar">
.
but the attr has a value of 'foo bar' rather than 'foo' and 'bar' - I think this is what the tutorial means - the attribute has one value.
If we're representing PIDs as URLs - which can't contain (unencoded) whitespace, I think we could operate a 'pids' attribute like a series of whitespace separated words.
As we're trying to coerce PIDs into the existing DC elements with as little change as possible, then I think a pids="X Y"
attribute is workable:
pids
attribute would ignore itIf we were creating a new metadata schema (or writing a full extension to an existing one) then a more usual representation e.g. <pids><pid>X</pid><pid>Y</pid></pids>
would probably be the right approach.
I think treating attributes as tokens, split on a specific character isn't that uncommon a practice in XML. In HTML, the 'class' attribute makes use of this - so much so it now has it's own interface to add/remove classes.
I've lashed together a test using XSLT v1.0, processed with xsltproc, which has no issues dealing with multiple whitespace-separated tokens.
Example XSLT (based on OAI-PMH, just because it was close to hand):
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()"/>
<xsl:template match="oai_dc:dc" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" >
<div class="dcdata">
<h3>Dublin Core Metadata (oai_dc)</h3>
<table class="dcdata">
<xsl:apply-templates select="*" />
</table>
</div>
</xsl:template>
<xsl:template match="dc:creator" xmlns:dc="http://purl.org/dc/elements/1.1/">
<tr>
<td class="key">Author or Creator</td>
<td class="value">
<xsl:value-of select="."/>
<xsl:call-template name="split-pids">
<xsl:with-param name="text" select="normalize-space(./@pids)" />
</xsl:call-template>
</td>
</tr>
</xsl:template>
<xsl:template name="split-pids">
<xsl:param name="text"/>
<xsl:param name="delimiter" select="' '"/>
<xsl:param name="output-delimiter" select="' '"/>
<xsl:if test="$text != ''">
<xsl:value-of select="$output-delimiter"/>
<xsl:choose>
<xsl:when test="contains($text, $delimiter)">
<xsl:call-template name="link-pid">
<xsl:with-param name="pid" select="substring-before($text, $delimiter)" />
</xsl:call-template>
<!-- recursive call -->
<xsl:call-template name="split-pids">
<xsl:with-param name="text" select="substring-after($text, $delimiter)" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="link-pid">
<xsl:with-param name="pid" select="$text" />
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:template>
<xsl:template name="link-pid">
<xsl:param name="pid"/>
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:value-of select="$pid"/>
</xsl:attribute>
<xsl:value-of select="$pid"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Example XML
<?xml version='1.0' encoding='UTF-8'?>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<dc:title>Geochemical Sources and Availability of Amidophosphates on the Early Earth</dc:title>
<dc:creator pids="https://orcid.org/0000-1234-1234-5432 https://isni.org/ABCD-ZYZ">Gibard, C</dc:creator>
<dc:creator pids=" ">Gorrell, IB</dc:creator>
<dc:creator pids="">Jiménez, EI</dc:creator>
<dc:creator pids="https://orcid.org/0000-0002-2553-766X">Kee, TP</dc:creator>
<dc:creator pids="https://orcid.org/0000-9999-9999-9875 ">Pasek, MA</dc:creator>
<dc:creator>Krishnamurthy, R</dc:creator>
<dc:description>Phosphorylation of (pre)biotically relevant molecules in aqueous medium ... <snip> ... aqueous environments on early earth.</dc:description>
<dc:publisher>Wiley</dc:publisher>
<dc:date>2019-06-11</dc:date>
<dc:type>Article</dc:type>
<dc:type>NonPeerReviewed</dc:type>
<dc:format>text</dc:format>
<dc:identifier>https://eprints.whiterose.ac.uk/145174/1/Gibard_et_al-2019-Angewandte_Chemie_International_Edition.pdf</dc:identifier>
<dc:identifier> Gibard, C, Gorrell, IB, Jiménez, EI et al. (3 more authors) (2019) Geochemical Sources and Availability of Amidophosphates on the Early Earth. Angewandte Chemie International Edition, 58 (24). pp. 8151-8155. ISSN 1433-7851 </dc:identifier>
<dc:relation>https://eprints.whiterose.ac.uk/145174/</dc:relation></oai_dc:dc>
</metadata>
Example output:
<?xml version="1.0"?>
<div xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" class="dcdata">
<h3>Dublin Core Metadata (oai_dc)</h3>
<table class="dcdata">
<tr xmlns:dc="http://purl.org/dc/elements/1.1/">
<td class="key">Author or Creator</td>
<td class="value">Gibard, C
<a href="https://orcid.org/0000-1234-1234-5432">https://orcid.org/0000-1234-1234-5432</a>
<a href="https://isni.org/ABCD-ZYZ">https://isni.org/ABCD-ZYZ</a></td>
</tr>
<tr xmlns:dc="http://purl.org/dc/elements/1.1/">
<td class="key">Author or Creator</td>
<td class="value">Gorrell, IB</td>
</tr>
<tr xmlns:dc="http://purl.org/dc/elements/1.1/">
<td class="key">Author or Creator</td>
<td class="value">Jiménez, EI</td>
</tr>
<tr xmlns:dc="http://purl.org/dc/elements/1.1/">
<td class="key">Author or Creator</td>
<td class="value">Kee, TP
<a href="https://orcid.org/0000-0002-2553-766X">https://orcid.org/0000-0002-2553-766X</a></td>
</tr>
<tr xmlns:dc="http://purl.org/dc/elements/1.1/">
<td class="key">Author or Creator</td>
<td class="value">Pasek, MA
<a href="https://orcid.org/0000-9999-9999-9875">https://orcid.org/0000-9999-9999-9875</a></td>
</tr>
<tr xmlns:dc="http://purl.org/dc/elements/1.1/">
<td class="key">Author or Creator</td>
<td class="value">Krishnamurthy, R</td>
</tr>
</table>
</div>
@jesusbagpuss
I'm guessing this was the tutorial you found: https://beginnersbook.com/2018/10/xml-attributes/
I pasted the wrong link above - what I had stumbled across was https://www.w3schools.com/xml/xml_dtd_el_vs_attr.asp which, I see, refers specifically to attributes in DTDs, not in XSDs.
I think treating attributes as tokens, split on a specific character isn't that uncommon a practice in XML.
Thank you for the explanation!
As we have established that including multiple values in an attribute, to be split on a specific character, is not illegal and, on the contrary, is actually practiced, I propose that we close this issue.
Closing...
In 2018, the discussion of the PID proposal raised the possibility that a PID attribute could have multiple values.
However, I happened to stumble across the assertion, in an XML tutorial, that attributes "cannot contain multiple values", while child elements can.
Is this true about XML attributes in general?