gbif / ipt

GBIF Integrated Publishing Toolkit (IPT)
https://www.gbif.org/ipt
Apache License 2.0
127 stars 58 forks source link

Preserve keywords in uploaded EML #2352

Open thomasstjerne opened 9 months ago

thomasstjerne commented 9 months ago

When a new dataset is created by uploading a DwC archive containing an EML, the IPT seems to overwrite the keywordCollections.

For example This archive generated by the eDNA tool has keywords ['metabarcoding', 'faeces', 'DNA', 'eDNA'] https://hosted-datasets.gbif-uat.org/edna/808096f4-6672-411a-9dc4-a12f40b36e5f/1/archive.zip https://api.gbif-uat.org/v1/dataset/0d81f98a-bf11-4887-93ff-3328f17d7b9f

But when published through the IPT the EML has changed and the keywords are gone: https://danbif.au.dk/ipt/archive.do?r=plants_other_eukaryotes_from_herbivore_faeces_argentina https://api.gbif.org/v1/dataset/e94f1b8b-671f-4ca4-9e4d-ffef194430da

mike-podolskiy90 commented 9 months ago

Thanks Thomas, I'll have a look

mike-podolskiy90 commented 9 months ago

IPT require the Thesaurus/Vocabulary field to be filled in (Keywords section in Metadata, keywordThesaurus in EML). Although, if it's not filled you can still publish the resource - but keywords won't appear in the archive's EML.

mike-podolskiy90 commented 9 months ago

According to EML specs keywordThesaurus is required indeed