gbif / eml-profile

GBIF EML profile
0 stars 2 forks source link

how does eml-gbif relate to eml #1

Open pvgenuchten opened 4 years ago

pvgenuchten commented 4 years ago

We're in the proces of creating a metadata-schema-plugin for EML. We're a bit puzzled about the various versions numbers used. Can you share a bit of history or links to documentation on how the current standards evolved

In this repository, there is a schema https://github.com/gbif/eml-profile/blob/master/eml-gbif-profile.xsd which is quite different then the schema at https://github.com/gbif/rs.gbif.org/tree/master/schema/eml-2.1.1

What is the relation of the two schema's.

For the 2.2.0 EML schema, do you envision a similar implementation of the gbif profile?

timrobertson100 commented 4 years ago

Thanks @pvgenuchten - those questions really jog the memory as this was all done in around 2009 I think.

The people who did this aren't here to ask anymore but I recall that EML was itself releasing 2.1.x at that time, and there were some issues that led us to hosting a version of the EML xsd. I could ping Matt Jones and see if he can remember if it helps.

The GBIF EML profile was developed as a small subset of EML and then extended following the recommendations at the time in the additionalMetadata element. There is some text around it here. To my knowledge, when it was first released a GBIF metadata document would always validate against the EML schema too.

For the 2.2.0 EML schema, do you envision a similar implementation of the gbif profile?

We have no plans to, but I would assume at some point we will be bumping to a newer EML. I suspect it has changed significantly so we'd follow whatever recommendations there are at the time.

Are you just exploring, or are you hitting real issues please, in which case we can spend time investigating too?

pvgenuchten commented 4 years ago

Hi @timrobertson100, thanx for the quick reply. We try to update the geonetwork schema-plugin to 2.2.0, but we have some challenges with the 2.2.0 xsd. We noticed that https://github.com/gbif/eml-profile/blob/master/eml-gbif-profile.xsd works out of the box in GeoNetwork.

timrobertson100 commented 4 years ago

Mmm... I am sorry I am not more helpful. There was a genuine reason we ended up hosting XSDs but I am afraid that escapes me. @mdoering do you recall anything around this, please?

mdoering commented 4 years ago

The GBIF EML profile is a subset of the full EML. So our schema cherry picks the stuff we thought is relevant and supported by GBIF. We therefore had to create our own xsd using the proper eml namespace, 2.1.1 at that time. Having several schemas for the same namespace is nothing uncommon in the XML world.

It also specifies exactly what GBIF support in the additionalMetadata slot.

timrobertson100 commented 4 years ago

Thanks, @mdoering. There was a reason we ended up hosting the full EML XSDs too though, and I forget exactly why. I seem to recall something like a broken EML build - ring any bells?

mdoering commented 4 years ago

I think it was this: https://github.com/gbif/rs.gbif.org/commit/0f0ca8423e09267fd515617dee018e008163edee

Use GBIF hosted xml.xsd as W3C times out

We slightly modified the original schemas to use a local xml.xsd file to avoid the timeouts

timrobertson100 commented 4 years ago

Thanks @mdoering - there was something relating to EML XSDs before that (pre-github), but I forget the details now.

@pvgenuchten - is there anything more we can do here, or should we close the issue please?

kmexter commented 3 years ago

For the 2.2.0 EML schema, do you envision a similar implementation of the gbif profile?

FYI in fact we also have a question about moving over to 2.2. We are in the process of making our profile in EML 2.2 -- mapping from 2.1 is not such an issue, but adding the necessary semantic annotations will take us a bit longer -- and since some of our data ends up in GBIF, we were wondering about this. Would you not be able to harvest/recieve data that had eml.xml using eml 2.2?

timrobertson100 commented 3 years ago

Would you not be able to harvest/recieve data that had eml.xml using eml 2.2?

My guess is that most likely we would not without some software changes. To be honest we've been neglecting this area mainly because there hasn't been much demand to change things and it's working out OK. If that demand came (e.g. a push from publishers) then we would re-prioritize of course.

kmexter commented 3 years ago

OK, I see. Makes sense. FYI there is a demand from various European FAIR data programmes that metadata are semantically annotated. This is possible in a machine-interoperable and standardised way in EML 2.2 but not fully so in EML 2.1. That is why we are updating, to become more FAIR.

jdpye commented 3 years ago

Hi gang, sorry to revive this thread but we had a set of workshops across Canada recently where we picked up DwC and EML and talked about making DwC-A for all sorts of data, and one big ask from our colleagues on the federal side was for a nice way to properly treat multilingual data. It looks like later EML versions allow for all fields to have a language attribute, and this would solve Canada's requirement to provide equal billing in both official languages, plus the obvious advantages to pan-European organizations.

Is it still the plan to update the GBIF EML schema, and if so, would there be interest in building up the IPT's awareness of this aspect, and potentially rolling all the way into field-by-field multilingual support for GBIF-EML?

CecSve commented 1 year ago

Would you not be able to harvest/recieve data that had eml.xml using eml 2.2?

My guess is that most likely we would not without some software changes. To be honest we've been neglecting this area mainly because there hasn't been much demand to change things and it's working out OK. If that demand came (e.g. a push from publishers) then we would re-prioritize of course.

We have had some issues come up on help desk the last six months with publishers using EML version 2.1.1 so there seem to be a need to support this version. There was a miscommunication between EML versions and GBIFs EML schema so please ignore this comment.

kmexter commented 1 year ago

Would you not be able to harvest/recieve data that had eml.xml using eml 2.2?

My guess is that most likely we would not without some software changes. To be honest we've been neglecting this area mainly because there hasn't been much demand to change things and it's working out OK. If that demand came (e.g. a push from publishers) then we would re-prioritize of course.

We have had some issues come up on help desk the last six months with publishers using EML version 2.1.1 so there seem to be a need to support this version.

:-D :-D :-D

kmexter commented 1 year ago

So that means still no plans to update to eml 2.2? If it is a question of human resources, we of the open science team at VLIZ can help out, as we have been investigating converting our 2.1s to 2.2s for a while...We are not implementing only because much of our data goes to GBif and we thus prefer to export an EML that GBif can work with. However, various issues on annotations (to make metadata more machine-accessible), the location of the physical-distribution module (which e.g. geonetwork does not work well with being placed in additionalMetadata), language (as @jdpye mentioned in a comment above, we in Belgium also have a multi-lingual audience), other catalogues we export to (which have updated to eml 2.2), keep coming up for us. Not that I am trying to push here, or anything :-D....just encourage and offer a hand

timrobertson100 commented 1 year ago

Thank you very much @kmexter - It really is just a case that we haven't gotten to it... That you are keen to help would be greatly appreciated.

I think as a starting point, I suggest we document the changes from our (fairly minimal) profile to the latest version. We can then refer to that in the github issues for the code changes needed in the IPT and the Registry.

Would you be willing to try and start that perhaps?

kmexter commented 1 year ago

Sure, we can help: we have done a similar exercise here - quite a while ago so I will have to drag out the notes and refresh my memory, but I'd be happy to share that with you.

timrobertson100 commented 1 year ago

Thank you very much

kmexter commented 1 year ago

you can email me directly on katrina.exter@vliz.be. just tell me what you need or we can have a chat. k

mdoering commented 1 year ago

I leave here a few notes on changes in EML 2.2 that I would like to see discussed as candidates for a GBIF implementation - apart from just a blind 1:1 upgrade. Especially the much improved support for bibliographic references using BibTex would help a lot to be more compatible to the metadata handled in ChecklistBank and ColDP.

mdoering commented 1 year ago

The official EML Data Paper Example seems useful to look at: https://github.com/NCEAS/eml/blob/main/src/test/resources/eml-data-paper.xml

mike-podolskiy90 commented 1 year ago

I've created a new GBIF EML profile of EML 2.2 https://rs.gbif-uat.org/schema/eml-gbif-profile/1.3/eml.xsd It is now deployed to GBIF UAT environment, including IPT (ipt.gbif-uat.org) and the api.

Goal was to migrate with minimal changes, so any further improvements will be added incrementally.

MattBlissett commented 1 year ago

I've gone through the new features of EML 2.2.0, as well as older elements which weren't included in the GBIF Metadata Profile. I'll list them together, as it's over 10 years since the previous version so it's worth reconsidering everything. New in 2.2.0 is marked with ¤.

EML 2.2.0: https://eml.ecoinformatics.org/whats-new-in-eml-2-2-0.html

EML 2.1.1: https://sbclter.msi.ucsb.edu/external/InformationManagement/EML_211_schema/docs/eml-2.1.1/

GBIF Metadata Profile: https://ipt.gbif.org/manual/en/ipt/latest/gbif-metadata-profile

New or absent elements.

Those I've put in bold seem good contenders for GBIF's support.

Multilingual support

We describe this in the GMP guide, but never implemented it in the Registry. Now we can implement the newer method.

This seems important for a new profile version, but needs less-compatible API changes to the Registry to expose the result. Perhaps we could implement something without changing /v1/dataset (e.g. return English, or use the Accept-Language header, or use the primary language of the dataset), use /v1/dataset/..../document as a way to expose the full details, giving time to decide upon /v2/dataset.

The existing additionalMetadata/metadata/gbif/metadataLanguage should be using the xml:lang attribute on the eml or dataset element.

Markdown

Markdown in supported in several places. References to images are supported.

The current IPT and Registry implementations are using inlined, escaped HTML, though the EML schema says it should be DocBook elements. Changing to the DocBook elements might help interoperability with other publishers of EML, assuming they do things properly.

This seems important for a new profile version, but also needs less-compatible API changes to the Registry.

BibTeX

Citations can be provided in BibTeX as well as as plain text strings.

GBIF Extension

These are all the elements of the GBIF extension:

Other observations

We should check if we can support a full EML document (with the elements not in our profile), so we can support VLIZ providing a more complete document (as above).

kmexter commented 1 year ago

So we have been working here on creating a 2.2 profile based on our 2.1 profile. Since I could not attach an eml file to this comment, I have put it here instead https://drive.google.com/file/d/1GpxdscHCDappsk3GUMU4cmaRyt1AK_Ch/view?usp=sharing

What we have added to make our 2.2 are the following

We were thinking of asking EML to add some new elements in their next version - would be interesting to know if you also think these are important

While waiting for EML to respond to any request that I send, I was wondering whether GBIF would like to consider any of these elements I list above, as in adding them in \<additionalMetadata>, creating your own elements?