Open dagendresen opened 1 year ago
This kind of reminds me a bit of the Nordic plant uses dataset (https://ipt.gbif.no/resource?r=nhm-plant-uses). That uses the Literature reference extension like this:
For the taxonomy, I don't remember the details exactly but it looks like we used https://dwc.tdwg.org/terms/#dwc:acceptedNameUsage for the current accepted name, and scientificName for the name in the literature. The mapping looks like this:
We are going to publish a literature-based checklist+occurrence dataset. In DwC terms it will be Taxon core + Occurrence extension+References extension
Doesn't it perhaps make more sense to publish this as 2 datasets? 1 checklist, and 1 occurrence?
Hi Dag, hi Rukaya,
Thanks for your help!
The taxonomy mapping now is clear - ScientificName for cited names, AcceptedNameUsage for current names.
The scheme I was thinking about looks like that on the picture. It is a bit different from datasets published by Plazi - they are mostly records associated with one publication. Considering references I see no point in splitting references page by page, as it is done in the example with Nordic Plant, we don't have resources for this work. I just want to give a reference for each occurrence.
If I publish 2 separate datasets - Checklist and Ocurrence as Rukaya suggested - can I somehow keep the connection between them by taxonID? This should be explained in metadata then?
can I somehow keep the connection between them by taxonID?
Could you make globally unique taxonIDs? Such as generating a urn:uuid:UUID and reusing the same taxonID in both datasets.
Same for occurrenceID, try to generate a globally unique identifier -- and avoid composite identifiers (where you use the taxonID as part of the occurrenceID string) ;-)
Yes, I would use UUIDs (see https://www.uuidgenerator.net/ to bulk generate them) and then explain in the metadata of both datasets that they are related datasets and they complement each other.
Could you make globally unique taxonIDs? Such as generating a urn:uuid:UUID and reusing the same taxonID in both datasets.
Yes, sure. I was going to generate them via UUID.
Thank you for explainations!
Dear @dagendresen and @rukayaj I have several questions about the dataset from the above photo from Iryna. I'm trying to find a balance between a machine-readable and a human-readable dataset. Concerning Occurrence Extention:
Regarding TaxonID. Could this be the GBIF ID? In my dataset, the species will be repeated, can I put in this column, for example, https://www.gbif.org/species/212 for each species its own id? Regarding book ID. Could it be surname and year of publication, for example, Lavitska-1949, or a DOI for new publications? Books, of course, will also be repeated in the dataset. Thank you for your feedback!
Hi @JuliianaLeshchenko, for the taxonID you could indeed use any of the taxon identifiers that you can see at the bottom of the page on e.g. https://www.gbif.org/species/4352338. You could also use https://www.uuidgenerator.net/ to generate v4 uuids. It doesn't matter really as long as it's unique. You can see in the standards documentation https://dwc.tdwg.org/terms/#dwc:taxonID it just says it has to be something which at minimum is unique within the dataset.
For 'identifier' in the literature, it says this should be the ISBN or DOI or whatever. You can read more about the literature extension here https://rs.gbif.org/extension/gbif/1.0/references.xml. In other words in the last table in Iryna's image, 'id' and 'identifier' can be the same column. You would need a link back to the core table (so either taxonID or occurrenceID).
We suggest you publish two datasets, one for the occurrences and one for the taxon checklist.
If you want some one on one help then I or @MichalTorma can do a zoom call with you, or you can paste the data here and I will show you how I would format it.
Reusing the Catalog of Life LSID or the GBIF taxonKey as the taxonID in your dataset is much much better than creating a new UUID.
Advanced question from Iryna. Maybe look at how Plazi publishes? Maybe looks at the new Checklistbank?