NCEAS / eml

Ecological Metadata Language (EML)
https://eml.ecoinformatics.org/
GNU General Public License v2.0
41 stars 15 forks source link

Invalid EML in what's new documentation chapter #393

Closed twhiteaker closed 1 year ago

twhiteaker commented 1 year ago

Docs on the citation list type show with both elements and a element as direct children. If I'm interpreting the schema correctly, cannot include both of those elements as direct children. It can be one to many elements, or one to many elements.

Perhaps the example in the docs should be split into two examples (one for each child type).

mbjones commented 1 year ago

Hey @twhiteaker thanks for the report. I'm not quite following you though. The schema allows CitationListType to be a repeating choice of citation or bibtext, and each citation is a CitationType that allows a choice between either a traditional dcobook-inspired list of fields or the new bibtex element as well. So, with those two things, the examples look valid to me. The example data paper file (https://github.com/NCEAS/eml/blob/main/src/test/resources/eml-data-paper.xml) also uses this structure and passes the EML validation tests I think.

I'm probably not understanding the exact issue you are highlighting, so could you be more explicit showing a specific EML document that reproducibly doesn't validate and we can talk about that concretely?

twhiteaker commented 1 year ago

@mbjones I haven't produced a full EML document with this example pattern. It's my understanding that xs:choice lets you pick one or the other of the choices, but not both. So, CitationListType can include, as direct children, citation or bibtex but not both. The example below which includes citation elements is fine since bibtex is not a direct child of literatureCited but rather citation.

<literatureCited>
    <citation>
        <bibtex>
            @article{fegraus_2005,
                 title = {Maximizing the {Value} of {Ecological} {Data} with {Structured} {Metadata}: {An} {Introduction} to {Ecological} {Metadata} {Language} ({EML}) and {Principles} for {Metadata} {Creation}},
                 journal = {Bulletin of the Ecological Society of America},
                 author = {Fegraus, Eric H. and Andelman, Sandy and Jones, Matthew B. and Schildhauer, Mark},
                 year = {2005},
                 pages = {158--168}
             }
        </bibtex>
    </citation>
    <citation>
        <title>Title for a paper that used this dataset.</title>
        <creator>
            <individualName>
                <givenName>Mark</givenName>
                <surName>Jarkady</surName>
            </individualName>
        </creator>
        <pubDate>2017</pubDate>
        <article>
            <journal>EcoSphere</journal>
            <publicationPlace>https://doi.org/10.1002/ecs2.2166</publicationPlace>
        </article>
    </citation>
</literatureCited>

However, the example below would be invalid, because citation and bibtex are siblings and direct children of literatureCited.

<literatureCited>
    <bibtex>
        @article{fegraus_2005,
             title = {Maximizing the {Value} of {Ecological} {Data} with {Structured} {Metadata}: {An} {Introduction} to {Ecological} {Metadata} {Language} ({EML}) and {Principles} for {Metadata} {Creation}},
             journal = {Bulletin of the Ecological Society of America},
             author = {Fegraus, Eric H. and Andelman, Sandy and Jones, Matthew B. and Schildhauer, Mark},
             year = {2005},
             pages = {158--168}
             }
    </bibtex>
    <citation>
        <title>Title for a paper that used this dataset.</title>
        <creator>
            <individualName>
                <givenName>Mark</givenName>
                <surName>Jarkady</surName>
            </individualName>
        </creator>
        <pubDate>2017</pubDate>
        <article>
            <journal>EcoSphere</journal>
            <publicationPlace>https://doi.org/10.1002/ecs2.2166</publicationPlace>
        </article>
    </citation>
</literatureCited>

Does that help?

mbjones commented 1 year ago

I think the confusion here is that literatureCited is of type CitationListType, and the xs:choice itself in the CitationListType XML Schema definition is repeatable (via maxOccurs="unbounded"):

  <xs:complexType name="CitationListType">
        <xs:choice minOccurs="1" maxOccurs="unbounded">
            <xs:element name="citation" type="CitationType" />
            <xs:element name="bibtex" type="xs:string" />
        </xs:choice>
  </xs:complexType>

In DTD syntax this would be expressed as (citation | bibtex)+. This schema construct allows any number of elements from the choice list in any order, and that is often the intent of a repeatable choice. We use it all over the place in EML. So, in this case, we could have a series of elements like citation, citation, bibtext, citation or bibtext, bibtext, citation, and both are equally valid. I think if your example documents were put into an EML doc, you'd find they validate fine.

twhiteaker commented 1 year ago

Ah, I see. I was confused about how xs:choice works when maxOccurs="unbounded". Thanks for clearing that up!