Open mbjones opened 3 years ago
We've been using xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://nis.lternet.edu/schemas/EML/eml-2.2.0/xsd/eml.xsd"
. Is that second part, the nis.lternet.edu portion, not needed?
Here's a more complete example. Hmm, I also notice we have @xmlns:eml
with that same https://eml.ecoinformatics.org/eml-2.2.0
content, which seems redundant.
<eml:eml
xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.2"
xmlns:d1v1="NULL"
packageId="knb-lter-ble.18.2"
xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://nis.lternet.edu/schemas/EML/eml-2.2.0/xsd/eml.xsd"
system="ble"/>
@twhiteaker Thanks for following up. The xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0"
associates the "eml" prefix with the right namespace, and is what allows the root <eml:eml>
element (among others) to be properly namespaced. So it is needed.
The xsi:schemaLocation
attribute takes two values: the namespace, and the schemaLocation URI. So, your example says that, whenever I find an element in the https://eml.ecoinformatics.org/eml-2.2.0
namespace, the parser can find the xsd file associated with that namespace at the location https://nis.lternet.edu/schemas/EML/eml-2.2.0/xsd/eml.xsd
. This is the intended usage. But I would argue that it would be better to use: xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://eml.ecoinformatics.org/eml-2.2.0"
, which says to use the official location for the xsd file. Or, better yet, omit it altogether for the reasons I cited above.
@mbjones Thanks for the clarification. I'm a minimalist so I'm all for omitting xsi:schemaLocation
. If we do that...
I think "best practice" would be for clients to provide their own, verified copies of the schema.
A client would be a program consuming EML. So, if a data publisher omits xsi:schemaLocation
, they don't have to then provide a copy of the schema. It's up the client. Did I get that right? I'm trying to determine what additional actions a data publisher may need to take if we omit xsi:schemaLocation
.
Also, if we leave out xsi:schemaLocation
, then we can also leave out xmlns:xsi
, at least in my example above since I don't mention that namespace anywhere else.
Hi @twhiteaker -- yeah, if you don't reference a namespace prefix like xsi
in your document, then you can omit it.
And yes, I think you got it right on client responsibilities. In general, a client that is interpreting documents that it gets from the wild needs to control the schemas that are used to validate those documents. So the data providers' main job is to properly reference the namespace in their root element and in their document, and the client's job is to find a trusted copy of the schema that defines that namespace. Arbitrary URIs on the interwebs are not trusted sources of those schemas (we find many repositories that have made breaking changes to XSD documents and then posted them as if they were the original namespace). So, our client tooling is built where we provide a our own copies of the schemas which we get from the authoritative source (e.g., eml.ecoinformatics.org). Most client tools (like XML parsers and editors) have features to register your local trusted copy of an xsd for the tool to use (these are typically called "XML Catalogs").
@cgries Do you know if leaving out schemaLocation or xmlns:xsi will break EDI's congruency checker?
@twhiteaker, no, EDI's congruency checker would not break, but it would add another step in Oxygen if you use that.
I'm leaning toward omitting schemaLocation. Anyone in favor using xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://eml.ecoinformatics.org/eml-2.2.0"
instead, or something else?
@twhiteaker CAP uses the trusted source that Matt detailed and that you put in your last comment (i.e., xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd"
). I am intrigued by the suggestion to omit it altogether but will stick with that for the time being (and continue to think about this).
Per @cgries comment I would find it mildly annoying to have it break Oxygen as that is my preferred "my xml is not valid but I can't figure out what I did wrong tool" (I find the r eml pkg error messages there not terribly iformative). If you DO omit it, what is the workaround for using Oxygen? Can we add that to the BP? There also may be some overlap to the issues here: https://github.com/ropensci/EML/issues/292
@scelmendorf When I am working in Oxygen with EML (and other) schemas, I configure oxygen to use my local copy of the schemas, rather than trust that the document author provided a link to an unmodified version. Configuration is described here: https://www.oxygenxml.com/doc/versions/23.1/ug-editor/topics/using-XML-Catalogs.html If you set it up once, it will work with all EML documents, regardless of how people set schemaLocation.
@scelmendorf and others with Oxygen concerns, does @mbjones's strategy work for you?
Trying now: Most likely user error/failure to follow the instructions. But I added the schema to oxygen under preferences->xml->xml catalog, then deleted the xsi:schemaLocation from the eml xpath in my test document to see how this works. It doesn't now appear to be validating the xml, e.g I can put all sorts of bogus bits in there and it still says it's perfectly valid.
Great guide!
I note that section 4.1 recommends using schemaLocation with the XPath
/eml:eml/@schemaLocation
to help clients learn where to download schemas. Two issues:/eml:eml/@xsi:schemaLocation
, as the element is part of the xsi namespace. It also would need to have the xsi namespace defined in anxmlns:xsi
attribute on the root element as well.xsi:schemaLocation
for a document from eslewhere, it could point at a modified version of the schema, which might make the document actually invalid wrt to the official schemas. We have seen this frequently in DataONE with sites publishing their own variant schemas for ISO under the official namespace, and thereby losing the benefits of standardization. These nuances may be less compelling for people that want to quickly load a schema, so I understand if you want to keep the recommendation, but in our data centers we follow the best practice of explicitly omittingxsi:schemaLocation
.