gbif / rs.gbif.org

GBIF machine-readable resources
https://rs.gbif.org
11 stars 13 forks source link

bibliographicCitation has two different uses in GBIF datasets #127

Open Mesibov opened 5 months ago

Mesibov commented 5 months ago

In DwC core, bibliographicCitation is "A bibliographic reference for the resource as a statement indicating how this record should be cited (attributed) when used" and is linked to DublinCore http://purl.org/dc/terms/bibliographicCitation at https://rs.gbif.org/core/dwc_occurrence_2022-02-02.xml.

In the Literature References extension, bibliographicCitation is "A text string referring to an un-parsed bibliographic citation", such as a journal article. This use is again linked to DublinCore, at https://rs.gbif.org/extension/gbif/1.0/references.xml.

The ambiguity leads to data compilers filling a bibliographicCitation field in occurrence datasets with items that should instead be in associatedReferences or identificationReferences.

A possible resolution is to rename bibliographicCitation to literatureCitation in the Literature References extension.

mdoering commented 5 months ago

I would not change an extension definition which is in production for over a decade already and implemented in various software. The DC definition also fits well for literature extension records where the resource is a literature reference, not an occurrence or taxon (which the extension also applies to and often is used for):

DC: A bibliographic reference for the resource.

Mesibov commented 5 months ago

@mdoering, do you agree that the label "bibliographicCitation" applies to two different things in two different contexts (occurrence records and literature-extension records)?

Are there any other data labels in use by GBIF for which this is true?

mdoering commented 5 months ago

yes, very likely at least. A term has a specific meaning in each class (term) and isn't necessarily globally defined. At least a term refers to the record/instance of the class it is on, not on the "core" record e.g. an Occurrence. That's how the original extensions were all created. For example dc:type is often used with other values than the DwC ones, e.g. in the Description extension. And dc:bibliographicCitation is the citation of the record - in the case of literature records it is a classic literature citation.

Mesibov commented 5 months ago

@mdoering, I hadn't noticed this ambiguity before, i.e. "same label, different meanings in different [class] contexts". In a checklist dataset I audited last month, bibliographicCitation was a literature citation (one element of a record) in reference.txt. The linked occurrence.txt did not have a bibliographicCitation field, but if it had, the entries should have been identifiers for the record itself, not just for an element of the record. All too often bibliographicCitation in that latter role is misused, which is a different issue.