gbif / occurrence

Occurrence store, download, search
Apache License 2.0
22 stars 15 forks source link

Remove unused Dublin Core (DC) terms from the Darwin Core download #294

Closed MattBlissett closed 1 year ago

MattBlissett commented 1 year ago

Several DC terms are present in DWCA downloads, although these terms are not part of Darwin Core and are therefore unused and always null.

These terms are part of DWC and must be kept:

KEEP http://purl.org/dc/terms/accessRights
KEEP http://purl.org/dc/terms/bibliographicCitation
KEEP http://purl.org/dc/terms/language
KEEP http://purl.org/dc/terms/license
KEEP http://purl.org/dc/terms/modified
KEEP http://purl.org/dc/terms/references
KEEP http://purl.org/dc/terms/rightsHolder
KEEP http://purl.org/dc/terms/type

This one is used by us, so we should keep it even though it's not DWC:

KEEP http://purl.org/dc/terms/publisher

This one should be removed, as already requested in https://github.com/gbif/pipelines/issues/821

REMOVE http://purl.org/dc/terms/identifier

These should all be removed:

http://purl.org/dc/terms/abstract
http://purl.org/dc/terms/accrualMethod
http://purl.org/dc/terms/accrualPeriodicity
http://purl.org/dc/terms/accrualPolicy
http://purl.org/dc/terms/alternative
http://purl.org/dc/terms/audience
http://purl.org/dc/terms/available
http://purl.org/dc/terms/conformsTo
http://purl.org/dc/terms/contributor
http://purl.org/dc/terms/coverage
http://purl.org/dc/terms/created
http://purl.org/dc/terms/creator
http://purl.org/dc/terms/dateAccepted
http://purl.org/dc/terms/dateCopyrighted
http://purl.org/dc/terms/dateSubmitted
http://purl.org/dc/terms/description
http://purl.org/dc/terms/educationLevel
http://purl.org/dc/terms/extent
http://purl.org/dc/terms/hasFormat
http://purl.org/dc/terms/hasPart
http://purl.org/dc/terms/hasVersion
http://purl.org/dc/terms/instructionalMethod
http://purl.org/dc/terms/isFormatOf
http://purl.org/dc/terms/isPartOf
http://purl.org/dc/terms/isReferencedBy
http://purl.org/dc/terms/isReplacedBy
http://purl.org/dc/terms/isRequiredBy
http://purl.org/dc/terms/isVersionOf
http://purl.org/dc/terms/issued
http://purl.org/dc/terms/mediator
http://purl.org/dc/terms/medium
http://purl.org/dc/terms/provenance
http://purl.org/dc/terms/relation
http://purl.org/dc/terms/replaces
http://purl.org/dc/terms/requires
http://purl.org/dc/terms/rights
http://purl.org/dc/terms/source
http://purl.org/dc/terms/spatial
http://purl.org/dc/terms/subject
http://purl.org/dc/terms/tableOfContents
http://purl.org/dc/terms/temporal
http://purl.org/dc/terms/title
http://purl.org/dc/terms/valid

(See https://github.com/gbif/pipelines/issues/837 for rights and source.)

And these aren't there anyway:

http://purl.org/dc/terms/date
http://purl.org/dc/terms/format

Probably this bit needs tidying up. Maybe DWC_PROPERTIES should include the DcTerms that are part of DWC?

https://github.com/gbif/occurrence/blob/94d295c98ef79f7a7d5930aad7c97579ec9f260c/occurrence-common/src/main/java/org/gbif/occurrence/common/TermUtils.java#L450

https://dwc.tdwg.org/terms/