gbif / doc-freshwater-data-publishing-guide

https://doi.org/10.35035/doc-sw3k-w725
Other
1 stars 0 forks source link

GBIF-required metadata #20

Open CecSve opened 5 months ago

CecSve commented 5 months ago

Feedback for: https://docs.gbif-uat.org/freshwater-data-publishing-guide/en/#gbif-required-metadata

The first sentence is a repetition of the beginning of the last section and can be deleted:

The metadata required by GBIF describes details about a dataset that include its scope, ownership and usage rights.

The next section:

When datasets are downloaded individually from GBIF, the XML metadata file is included and metadata fields from this table are automatically added to the occurrence file. When data are selected for download from within a polygon (thereby choosing data from multiple studies over a given geographic area), less of the metadata is provided in the occurrence table, but the permanent link to the data selection (provided by GBIF with the data download) allows the user to explore metadata for each individual project.

is referring to the use of data and not the publishing of data. I would recommend deleting it as well as it might be confusing for a publisher.

MattBlissett commented 5 months ago

The "Terms" in the table here are the labels from the IPT's metadata editor interface, not the EML (or GBIF Metadata Profile) term names.

The "required" fields are not correct, e.g. geographic coverage is optional — though recommended, of course.

The citation field from metadata is generally not used within GBIF.

jenlento commented 2 months ago

@CecSve Thank you for the suggestions. We deleted the first sentence as suggested. For the next paragraph, we prefer to leave this text in, as it is a situation where it's useful for those providing data to understand what happens to the metadata and how this information is provided to the user. We've added "It is useful to know..." to the start of this paragraph.

@MattBlissett These terms and list of required terms came from the list that is mandatory on the IPT. We are not clear why EML or GBIF Metadata Profile would differ, and this is something we had earlier pointed out to GBIF and noted that it needed discussion and clarification with GBIF. If users are providing data, they must provide the data that are required on the IPT, otherwise data cannot be pushed to GBIF.

As far as the citation field, when data are downloaded from GBIF, the data come with the citation field. Therefore, it is in the best interests of data providers to fill this field.

jenlento commented 1 month ago

Leaving this open in case anyone wants to discuss the issue of required terms.

MattBlissett commented 1 month ago

I think the terms being in monospace font and having had spaces removed suggests these are technical identifiers, but they are not — they are the labels used in the IPT interface.

For example, the EML elements abstract, intellectualRights and creator are labelled as "Description", "Licence" and "Resource creator" in the IPT's user interface.

It's not necessary to use the IPT to publish data to GBIF, although it is the most common method.

We don't use the citation field from the EML. Dataset citations are generated from the dataset contacts and organization.

ManonGros commented 1 month ago

Decision: use the EML terms instead of the IPT ones (also see https://rs.gbif.org/schema/eml-gbif-profile/1.3/)

CecSve commented 1 month ago

The most recent GBIF EML schema is found here. The schema (if you open in Chrome, then the XML renders, it will not render in Firefox) shows which elements are optional by the addition of minOccurs="0".

I will provide a table with what is required and not required, possibly adding a column to the table for what is in EML and what you see in the IPT. However, I would advise against showing both options and stick to EML as was decided previously, since I imagine it will be quite confusing for publishers to navigate the differences.

CecSve commented 1 month ago
Term IPT Term EML Status EML Within EML element
title title Required
description abstract Required
metadataLanguage
dataLanguage
publishingOrganization organizationName Required publisher
type
updateFrequency maintenanceUpdateFrequency Not required (required if maintenance is specified) maintenance
dataLicense licensed Required
resourceContact(s) contact Required
resourceCreator(s) creator Required
metadataProvider(s) Not required
geographicCoverage coverage Required
projectData project Required
samplingMethods samplingDescription Required methods
citation
dschigel commented 1 month ago

Decision: use the EML terms instead of the IPT ones (also see https://rs.gbif.org/schema/eml-gbif-profile/1.3/) SO do we agree instead to use EML term where present (in @CecSve's table above), and keep IPT term where there is no EML? It would be helpful to clarify editing task, sorting our terms into to-keep and to-replace (with what). But now we know why!

CecSve commented 1 month ago

For clarity, I included all the fields from the original table. I would propose to remove dataLanguage, metadataLanguage, citation, metadataProviders, and type and the Term IPT column.

schmikloi commented 1 month ago

@CecSve I changed the terms in the metadata table to EML language and requirements; but I do not know why you propose to delete the lines like dataLanguage etc. I think they are useful (and are needed for IPT upload). Not sure what the "Term IPT column" is?

jenlento commented 3 weeks ago

We have updated the text and tables as suggested, but we've left in some of the terms that we felt were useful and indicted that although they are not required by EML, they contain useful information. Hopefully this is a good compromise.