cboettig / eml2

:package: A list-based rewrite of higher-level functions from EML
5 stars 3 forks source link

backslash in id of custom unit? #18

Open scelmendorf opened 6 years ago

scelmendorf commented 6 years ago

Setting custom units (can use example here): https://github.com/cboettig/eml2/blob/master/EML_vignettes/working-with-units.Rmd

the attributeList looks fine in R but the 'id' seems to acquire a backslash when it's written out to eml, i.e.<unit id="/speciesPerSquareMeter" below where I think it should be <unit id="speciesPerSquareMeter"

<?xml version="1.0" encoding="UTF-8"?>
<eml:eml xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1" packageId="f8420ada-99a7-11e8-92c7-c71212260c6c" system="uuid" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1/ eml.xsd">
  <dataset>
    <title>A Mimimal Valid EML Dataset</title>
    <creator>
      <individualName>
        <givenName>Carl</givenName>
        <surName>Boettiger</surName>
      </individualName>
    </creator>
    <contact>
      <individualName>
        <givenName>Carl</givenName>
        <surName>Boettiger</surName>
      </individualName>
    </contact>
  </dataset>
  <additionalMetadata>
    <metadata>
      <unitList>
        <unit id="/speciesPerSquareMeter" multiplierToSI="1" name="speciesPerSquareMeter" parentSI="numberPerSquareMeter" unitType="arealDensity">
          <description>number of species per square meter</description>
        </unit>
      </unitList>
    </metadata>
  </additionalMetadata>
</eml:eml>

also curious about the choice to omit all the stmml info in the units section, should this still be in there or does it not matter? (e.g. in EML the unitList line renders as

<unitList xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

whereas in eml2 its simply

<unitList>
cboettig commented 6 years ago

thanks for the bug report, traveling at the moment but will try and fix this next week.

cboettig commented 6 years ago

Thanks for catching this one. The slash issue in the id turned out to be a (somewhat subtle) bug in the emld side; should be fixed now if you install the latest emld from github. (has to do with how id is interpreted in EML vs the somewhat stricter rule for JSON-LD tech I use under the hood in emld, but fixed now I think. This impacted all id elements that were not URIs).

Regarding the stmml info, you'll see that eml2 output is just defining the stmml namespace in the top level eml element, so it's inherited in the unitList. Repeating it would seem a bit redundant, I wanted all the namespaces together at the top level for convenience. I believe both are valid but practices vary a bit, not sure if one is considered more 'best practice'. Maybe @mbjones or or @amoeba has a more informed take on where to define namespaces.

amoeba commented 6 years ago

Yeah AFAIK, namespace searches bubble upwards to the nearest parent with a matching definition and XML lets you be way more flexible than you probably should when serializing them. The most sane thing I've seen is to only serialize namespaces at the root of the tree and not use any complex namespace hierarchies.

mbjones commented 6 years ago

Yes, convention is to define namespaces on the root element, but they can be overridden at any child. XML processors handle this namespace scoping for you, and generally put the declaration on the parent closest to the root that shares that namespace. There are complex rules on namespace definitions too and those are strongly affected by the elementFormDefault attribute and other similar switches. See https://stackoverflow.com/questions/1463138/what-does-elementformdefault-do-in-xsd