Open CecSve opened 1 month ago
Perhaps a table specifying what is EML profile required, IPT metadata required, similar to this table. Additional information on what is EML-derived and what are GBIF-specific elements would be nice a well, similar to the logo we add in the definition of this table specifying whether it is a GBIF term or a data standard term.
We need to avoid publishing conflicting information.
There are terms with a strict technical requirement enforced by the data format — an EML document is invalid without them. These are specified in the XSD schema definition in the case of EML / GBIF's EML profile.
There are additional terms where the requirement is enforced by our API or other processing, e.g. licence.
And there are terms where we write that they are required, but there's no technical enforcement of this: https://www.gbif.org/data-quality-requirements-occurrences — occurrenceID, basisOfRecord, scientificName, eventDate.
The GBIF Metadata Profile document could be migrated into the tech docs: https://ipt.gbif.org/manual/en/ipt/latest/gbif-metadata-profile
As it's in the IPT at present, it has complete Spanish and Japanese translations and we don't want to lose these.
I will take relevant sections of the metadata profile guide and move them to a dedicated page on metadata in tech docs. I will first commit entire sections as is, then create new commits for edits, so it is possible to add translations from the original section.
The removed sections in the IPT manual will stay in during the transition phase and contain links to the relevant section in tech docs.
I think it is not possible with the current format we have for the eml-profile, but it would be nice if we could harvest the definitions from the schema itself instead of copy-pasting. Would we need to generate XML schema documentation to transfer the descriptions in the current GBIF EML Profile?
For the Darwin Core Archive core/extension XML I used a Python script to generate a snippet of AsciiDoctor: https://github.com/gbif/tech-docs/blob/main/en/data-use/modules/ROOT/partials/download-terms-tables.py
Based on the discussion here https://github.com/gbif/doc-freshwater-data-publishing-guide/issues/20
It is unclear for publishers what is required and not required metadata. There may be differences between EML (or GBIF Metadata Profile) and the IPT's metadata editor interface that should be resolved.