Open pvgenuchten opened 4 years ago
Thanks @pvgenuchten - those questions really jog the memory as this was all done in around 2009 I think.
The people who did this aren't here to ask anymore but I recall that EML was itself releasing 2.1.x at that time, and there were some issues that led us to hosting a version of the EML xsd. I could ping Matt Jones and see if he can remember if it helps.
The GBIF EML profile was developed as a small subset of EML and then extended following the recommendations at the time in the additionalMetadata element. There is some text around it here. To my knowledge, when it was first released a GBIF metadata document would always validate against the EML schema too.
For the 2.2.0 EML schema, do you envision a similar implementation of the gbif profile?
We have no plans to, but I would assume at some point we will be bumping to a newer EML. I suspect it has changed significantly so we'd follow whatever recommendations there are at the time.
Are you just exploring, or are you hitting real issues please, in which case we can spend time investigating too?
Hi @timrobertson100, thanx for the quick reply. We try to update the geonetwork schema-plugin to 2.2.0, but we have some challenges with the 2.2.0 xsd. We noticed that https://github.com/gbif/eml-profile/blob/master/eml-gbif-profile.xsd works out of the box in GeoNetwork.
Mmm... I am sorry I am not more helpful. There was a genuine reason we ended up hosting XSDs but I am afraid that escapes me. @mdoering do you recall anything around this, please?
The GBIF EML profile is a subset of the full EML. So our schema cherry picks the stuff we thought is relevant and supported by GBIF. We therefore had to create our own xsd using the proper eml namespace, 2.1.1 at that time. Having several schemas for the same namespace is nothing uncommon in the XML world.
It also specifies exactly what GBIF support in the additionalMetadata slot.
Thanks, @mdoering. There was a reason we ended up hosting the full EML XSDs too though, and I forget exactly why. I seem to recall something like a broken EML build - ring any bells?
I think it was this: https://github.com/gbif/rs.gbif.org/commit/0f0ca8423e09267fd515617dee018e008163edee
Use GBIF hosted xml.xsd as W3C times out
We slightly modified the original schemas to use a local xml.xsd file to avoid the timeouts
Thanks @mdoering - there was something relating to EML XSDs before that (pre-github), but I forget the details now.
@pvgenuchten - is there anything more we can do here, or should we close the issue please?
For the 2.2.0 EML schema, do you envision a similar implementation of the gbif profile?
FYI in fact we also have a question about moving over to 2.2. We are in the process of making our profile in EML 2.2 -- mapping from 2.1 is not such an issue, but adding the necessary semantic annotations will take us a bit longer -- and since some of our data ends up in GBIF, we were wondering about this. Would you not be able to harvest/recieve data that had eml.xml using eml 2.2?
Would you not be able to harvest/recieve data that had eml.xml using eml 2.2?
My guess is that most likely we would not without some software changes. To be honest we've been neglecting this area mainly because there hasn't been much demand to change things and it's working out OK. If that demand came (e.g. a push from publishers) then we would re-prioritize of course.
OK, I see. Makes sense. FYI there is a demand from various European FAIR data programmes that metadata are semantically annotated. This is possible in a machine-interoperable and standardised way in EML 2.2 but not fully so in EML 2.1. That is why we are updating, to become more FAIR.
Hi gang, sorry to revive this thread but we had a set of workshops across Canada recently where we picked up DwC and EML and talked about making DwC-A for all sorts of data, and one big ask from our colleagues on the federal side was for a nice way to properly treat multilingual data. It looks like later EML versions allow for all fields to have a language attribute, and this would solve Canada's requirement to provide equal billing in both official languages, plus the obvious advantages to pan-European organizations.
Is it still the plan to update the GBIF EML schema, and if so, would there be interest in building up the IPT's awareness of this aspect, and potentially rolling all the way into field-by-field multilingual support for GBIF-EML?
Would you not be able to harvest/recieve data that had eml.xml using eml 2.2?
My guess is that most likely we would not without some software changes. To be honest we've been neglecting this area mainly because there hasn't been much demand to change things and it's working out OK. If that demand came (e.g. a push from publishers) then we would re-prioritize of course.
We have had some issues come up on help desk the last six months with publishers using EML version 2.1.1 so there seem to be a need to support this version. There was a miscommunication between EML versions and GBIFs EML schema so please ignore this comment.
Would you not be able to harvest/recieve data that had eml.xml using eml 2.2?
My guess is that most likely we would not without some software changes. To be honest we've been neglecting this area mainly because there hasn't been much demand to change things and it's working out OK. If that demand came (e.g. a push from publishers) then we would re-prioritize of course.
We have had some issues come up on help desk the last six months with publishers using EML version 2.1.1 so there seem to be a need to support this version.
:-D :-D :-D
So that means still no plans to update to eml 2.2? If it is a question of human resources, we of the open science team at VLIZ can help out, as we have been investigating converting our 2.1s to 2.2s for a while...We are not implementing only because much of our data goes to GBif and we thus prefer to export an EML that GBif can work with. However, various issues on annotations (to make metadata more machine-accessible), the location of the physical-distribution module (which e.g. geonetwork does not work well with being placed in additionalMetadata), language (as @jdpye mentioned in a comment above, we in Belgium also have a multi-lingual audience), other catalogues we export to (which have updated to eml 2.2), keep coming up for us. Not that I am trying to push here, or anything :-D....just encourage and offer a hand
Thank you very much @kmexter - It really is just a case that we haven't gotten to it... That you are keen to help would be greatly appreciated.
I think as a starting point, I suggest we document the changes from our (fairly minimal) profile to the latest version. We can then refer to that in the github issues for the code changes needed in the IPT and the Registry.
Would you be willing to try and start that perhaps?
Sure, we can help: we have done a similar exercise here - quite a while ago so I will have to drag out the notes and refresh my memory, but I'd be happy to share that with you.
Thank you very much
you can email me directly on katrina.exter@vliz.be. just tell me what you need or we can have a chat. k
I leave here a few notes on changes in EML 2.2 that I would like to see discussed as candidates for a GBIF implementation - apart from just a blind 1:1 upgrade. Especially the much improved support for bibliographic references using BibTex would help a lot to be more compatible to the metadata handled in ChecklistBank and ColDP.
The official EML Data Paper Example seems useful to look at: https://github.com/NCEAS/eml/blob/main/src/test/resources/eml-data-paper.xml
I've created a new GBIF EML profile of EML 2.2 https://rs.gbif-uat.org/schema/eml-gbif-profile/1.3/eml.xsd It is now deployed to GBIF UAT environment, including IPT (ipt.gbif-uat.org) and the api.
Goal was to migrate with minimal changes, so any further improvements will be added incrementally.
I've gone through the new features of EML 2.2.0, as well as older elements which weren't included in the GBIF Metadata Profile. I'll list them together, as it's over 10 years since the previous version so it's worth reconsidering everything. New in 2.2.0 is marked with ¤.
EML 2.2.0: https://eml.ecoinformatics.org/whats-new-in-eml-2-2-0.html
EML 2.1.1: https://sbclter.msi.ucsb.edu/external/InformationManagement/EML_211_schema/docs/eml-2.1.1/
GBIF Metadata Profile: https://ipt.gbif.org/manual/en/ipt/latest/gbif-metadata-profile
New or absent elements.
Those I've put in bold seem good contenders for GBIF's support.
dataset/shortname
— unlikely we'd use thisdataset/title
— we describe the older multilingual method in the GMP document, but never implemented itdataset/creator/individualName/salutation
— not so much for EML, but could be useful for some publishersdataset/series
— don't think this is relevant to usdataset/licensed/url
— ¤ replaces the ulink kludge we have in dataset/intellectualRights
. EML also recommend a particular set of licence URIs, different to what we use ourselves (see also https://github.com/gbif/ipt/issues/1967)dataset/distribution/online
— could describe endpoints, or locations for downloads, rather than only homepages. Should replace additionalMetadata/metadata/gbif/physical/...
.dataset/coverage/taxonomicCoverage/taxonomicClassification/taxonId
— ¤ taxonomic identifiers now supported.dataset/annotation
— ¤ structured annotations (key-value).dataset/purpose
— "A synopsis of the purpose of this dataset"dataset/introduction
— ¤ recommended for data papersdataset/gettingStarted
— ¤ data papersdataset/acknowledgements
— ¤ also recommended for data papersdataset/publisher
— for the publisher?dataset/pubPlace
— just a publisher address, not sure why this wouldn't be in dataset/publisher/address
. Maybe in case it's different?dataset/project/award
— ¤ new for structured funding information, all useful for project trackingdataset/dataTable
— data, could hold data mappings for published data and/or downloadsdataset/referencePublication
— ¤ "Common cases where a Reference Publication may be useful include when a data paper is published that describes the dataset, or when a paper is intended to be the canonical or examplar reference to the dataset." Could reference all datasets in a download, and source in treatments.dataset/usageCitation
— ¤ would be a way to show citations of GBIF datasets. Can now use BibTeX.dataset/literatureCited
— ¤ replaces additionalMetadata/metadata/gbif/bibliography
. Can now use BibTeX.We describe this in the GMP guide, but never implemented it in the Registry. Now we can implement the newer method.
This seems important for a new profile version, but needs less-compatible API changes to the Registry to expose the result. Perhaps we could implement something without changing /v1/dataset (e.g. return English, or use the Accept-Language header, or use the primary language of the dataset), use /v1/dataset/..../document as a way to expose the full details, giving time to decide upon /v2/dataset.
The existing additionalMetadata/metadata/gbif/metadataLanguage
should be using the xml:lang
attribute on the eml
or dataset
element.
Markdown in supported in several places. References to images are supported.
The current IPT and Registry implementations are using inlined, escaped HTML, though the EML schema says it should be DocBook elements. Changing to the DocBook elements might help interoperability with other publishers of EML, assuming they do things properly.
This seems important for a new profile version, but also needs less-compatible API changes to the Registry.
Citations can be provided in BibTeX as well as as plain text strings.
These are all the elements of the GBIF extension:
dateStamp
— could perhaps be replaced with dataset/maintenance/changeHistory/changeDate
, and at the same time add version information. (Someone has asked for that, I can't find the issue.)metadataLanguage
— could use xml:lang
on dataset
insteadheirarchyLevel
— keepcitation
— keep, EML suggest generating a citation from the components (authors etc)bibliography
— replace with dataset/literatureCited
physical
— move to dataset/physical
resourceLogoUrl
— NCD elements. Should these all go in the new Semantic Metadata section? (And be replace with TDWG Collection Descriptions elements?)collection/parentCollectionIdentifier
collection/collectionName
collection/collectionIdentifier
formationPeriod
livingTimePeriod
specimenPreservationMethod
jgtiCuratorialUnit
replaces
— keep?We should check if we can support a full EML document (with the elements not in our profile), so we can support VLIZ providing a more complete document (as above).
So we have been working here on creating a 2.2 profile based on our 2.1 profile. Since I could not attach an eml file to this comment, I have put it here instead https://drive.google.com/file/d/1GpxdscHCDappsk3GUMU4cmaRyt1AK_Ch/view?usp=sharing
What we have added to make our 2.2 are the following
We were thinking of asking EML to add some new elements in their next version - would be interesting to know if you also think these are important
While waiting for EML to respond to any request that I send, I was wondering whether GBIF would like to consider any of these elements I list above, as in adding them in \<additionalMetadata>, creating your own elements?
We're in the proces of creating a metadata-schema-plugin for EML. We're a bit puzzled about the various versions numbers used. Can you share a bit of history or links to documentation on how the current standards evolved
In this repository, there is a schema https://github.com/gbif/eml-profile/blob/master/eml-gbif-profile.xsd which is quite different then the schema at https://github.com/gbif/rs.gbif.org/tree/master/schema/eml-2.1.1
What is the relation of the two schema's.
For the 2.2.0 EML schema, do you envision a similar implementation of the gbif profile?