gbif / rs.gbif.org

GBIF machine-readable resources
https://rs.gbif.org
11 stars 13 forks source link

Add collectionCode and collectionID to eventCore #96

Open timrobertson100 opened 1 year ago

timrobertson100 commented 1 year ago

Originally reported here by @peterdesmet, providing the justification:

to indicate the (virtual) collection an event based dataset is derived from

The seems reasonable and fits within the intention from the DwC collectionCode description:

The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived.

peterdesmet commented 1 year ago

Thanks Tim. For completeness, I'd immediately add collectionID as well:

An identifier for the collection or dataset from which the record was derived.

dagendresen commented 1 year ago

Would not dwc:datasetID and dwc:datasetName be more appropriate? Or even better proposing new record-level terms for project and projectID in Darwin Core? Than using collectionCode and collectionID for this purpose?

peterdesmet commented 1 year ago

@dagendresen, in my experience datasetName and datasetID are mainly used for the published dataset itself (title + doi). project and projectID could be useful, but to indicate the project (cf. project in metadata) not the originating source database.

Indicating the originating source database/system doesn't violate the definition for collectionCode in my opinion:

The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived.

It is already used as such for Occurrence datasets (not only by me, see e.g. EBIRD in https://www.gbif.org/occurrence/3504179613). I don't think it makes sense to exclude it if your dataset happens to be organized as an Event Core dataset, which is why I'm proposing here to include it.

dagendresen commented 1 year ago

Would not rather the datasetName and datasetID for the published dataset itself instead belong in the dataset-level EML metadata than as a property for each record?

datasetName and datasetID are mainly used for the published dataset itself

Records inside the same dataset could be from different named specimen collections.

GBIF Norway uses datasetName as a (poor) proxy for "projectName" - records inside the same dataset (DwC-A dataset) often originate from different projects (e.g., collecting expeditions) or are improved In different GBIF-node-funded digitization, georeferencing, or other data quality projects. And we here require grant recipients to use datasetName or datasetID to credit the node grant project.

peterdesmet commented 1 year ago

Yeah, the uses of those fields differ. Note that the IPT does suggest the resource DOI for the datasetID (i.e. resource = datasetName/datasetID):

Screenshot 2023-02-22 at 10 59 04

Irrespective on how the fields are actually used, I don't think collectionCode and collectionID should be excluded if you structure your dataset like an event-core dataset.