gbif / eml-profile

GBIF EML profile
0 stars 2 forks source link

Include structured citations in EML #4

Open mdoering opened 1 year ago

mdoering commented 1 year ago

For the custom GBIF extension of EML it would be good to also have a structured citation in addition to the citation string and identifier. That applies to both the main citation as well as the bibliography.

It will allow the registry to be able to search and facet on journals and publishers, something important to journals participating in publishing treatment articles. For example:

<additionalMetadata>
    <metadata>
        <gbif>
            <citation 
                identifier="http://doi.org/10.5886/zw3aqw"
                type="Dataset"
                author="Brouillet L"
                title="Database of Vascular Plants of Canada (VASCAN)"
                version="37.12"
                issued="2023"
                publisher="Université de Montréal Biodiversity Centre"
                accessed="2023-04-27"
                url="https://doi.org/10.5886/zw3aqw"
            >Brouillet L (2023). Database of Vascular Plants of Canada (VASCAN). Version 37.12. Université de Montréal Biodiversity Centre. Checklist dataset https://doi.org/10.5886/zw3aqw accessed via GBIF.org on 2023-04-27.</citation>
            <bibliography>
                <citation 
                    identifier="http://doi.org/10.1111/j.1095-8339.2009.00996.x"
                    type="ARTICLE-JOURNAL"
                    author="Angiosperm Phylogeny Group"
                    title="An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III"
                    containerTitle="Botanical Journal of the Linnaen Society"
                    issued="2009"
                    volume="161"
                    page="105–121"
                >Angiosperm Phylogeny Group (2009) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Botanical Journal of the Linnaen Society 161: 105–121. doi: 10.1111/j.1095-8339.2009.00996.x</citation>
                <citation identifier="http://doi.org/10.1111/j.1095-8339.2009.01002.x">Chase MW, Reveal JL (2009) A phylogenetic classification of land plants to accompany APG III. Botanical Journal of the Linnaen Society 161 (2): 122–127. doi: 10.1111/j.1095-8339.2009.01002.x</citation>
            </bibliography>
        </gbif>
    </metadata>
</additionalMetadata>

List of main fields corresponding to CSL-JSON:

mdoering commented 1 year ago

This is similar to the source filed in the ColDP metadata. @thomasstjerne @timrobertson100 @ahahn-gbif @gsautter does that represent our discussion well?

gsautter commented 1 year ago

@mdoering please have a look at the eml.xml in tb.plazi.org/GgServer/dwca/995CFFC54F0AB7277923CD0E036BB046.zip ... is that what you had in mind?

mdoering commented 1 year ago

Exactly, yes!

<citation author="Mondaca, José; Rebolledo, Guido; Vitali, Francesco" containerTitle="Insecta Mundi" doi="http://doi.org/10.5281/zenodo.7887620" issn="1942-1354" issue="979" issued="2023" page="1-5" title="Stictoleptura cordigera (Füssli, 1775) (Cerambycidae: Lepturinae: Lepturini), a new alien longhorn beetle introduced in Chile" volume="2023">Mondaca, José, Rebolledo, Guido, Vitali, Francesco (2023): Stictoleptura cordigera (Füssli, 1775) (Cerambycidae: Lepturinae: Lepturini), a new alien longhorn beetle introduced in Chile. Insecta Mundi 2023 (979): 1-5, DOI: http://doi.org/10.5281/zenodo.7887620</citation>

The only thing I wonder about is whether the main, single citation element should be used for this or one under bibliography. The main one is the dataset citation (i.e. type="Dataset"), so it's odd if it differs from the other metadata.

gsautter commented 1 year ago

The only thing I wonder about is whether the main, single citation element should be used for this or one under bibliography. The main one is the dataset citation (i.e. type="Dataset"), so it's odd if it differs from the other metadata.

Well, considering the fact that the citation element has always contained the citation of the source publication, I think that's a rather natural place to put the detail attributes as well ... on top of the fact that we've never used the bibliography element at all in eml.xml so far ... what sorts of alternatives do you have in mind?

mdoering commented 1 year ago

AFAIK GBIF currently ignores the citation element and produces its own citation from the rest of the metadata. From GBIF:

Plazi.org taxonomic treatments database. Stictoleptura cordigera (Füssli, 1775) (Cerambycidae: Lepturinae: Lepturini), a new alien longhorn beetle introduced in Chile. Checklist dataset https://doi.org/10.15468/nkfvu8 accessed via GBIF.org on 2023-05-02.

Your provided citation:

Mondaca, José, Rebolledo, Guido, Vitali, Francesco (2023): Stictoleptura cordigera (Füssli, 1775) (Cerambycidae: Lepturinae: Lepturini), a new alien longhorn beetle introduced in Chile. Insecta Mundi 2023 (979): 1-5, DOI: http://doi.org/10.5281/zenodo.7887620

Not only are the authors different, but also the DOI and other parts as GBIF tries to produce a citation for the dataset. The ColDP metadata.json you nicely do for ChecklistBank also includes the structured citation of the article as an entry in the source list which corresponds to the bibliography list in EML.

I would be interested to hear @ahahn-gbif and @timrobertson100, but I would recommend to place the pure & structured article citation inside the bibliography section of the EML and actually remove the other one. Basically this means moving the citation element from /eml/additionalMetadata/metadata/gbif/citation to /eml/additionalMetadata/metadata/gbif/bibliography/citation but keep it otherwise as it is now!

gsautter commented 1 year ago

Basically this means moving the citation element from /eml/additionalMetadata/metadata/gbif/citation to /eml/additionalMetadata/metadata/gbif/bibliography/citation but keep it otherwise as it is now!

Easy enough, merely an XML edit (in the eml.xml template) ... have another look at the eml.xml in http://tb.plazi.org/GgServer/dwca/995CFFC54F0AB7277923CD0E036BB046.zip ... is that what you suggested?

mdoering commented 1 year ago

yes, 100% that!

gsautter commented 1 year ago

OK, thanks ... all DwC_As from Plazi should look like that from now on (unless I change back the eml.xml template).

gsautter commented 1 year ago

@mdoering the post kind of implies you need something on top of the citation attributes I added in the past couple of days ... where exactly should those extra attributes go, and what should they be populated with?

mdoering commented 1 year ago

No, this is just to get what we discussed into a) the GBIF EML profile XSD b) let the GBIF registry read and store it and c) allow the dataset search to use some of that, e.g. the journal, publisher or issue date.

gsautter commented 1 year ago

No, this is just to get what we discussed into a) the GBIF EML profile XSD b) let the GBIF registry read and store it and c) allow the dataset search to use some of that, e.g. the journal, publisher or issue date.

Ah, OK ... take it this one is done from my end, then?

mdoering commented 1 year ago

No, this is just to get what we discussed into a) the GBIF EML profile XSD b) let the GBIF registry read and store it and c) allow the dataset search to use some of that, e.g. the journal, publisher or issue date.

Ah, OK ... take it this one is done from my end, then?

YES!