gbif-norway / helpdesk

Please submit your helpdesk request here (or send an email to helpdesk@gbif.no). We will also use this repo for documentation of node helpdesk cases.
GNU General Public License v3.0
3 stars 0 forks source link

Create javascript only html form which generates eml xml #67

Closed rukayaj closed 2 years ago

rukayaj commented 2 years ago

Request from Luke Marsden:

I want to create an EML metadata template that includes each metadata term, a description and whether each term is required, highly recommended or optional. This should mimic the terms that one can fill in using the IPT. However, in some cases we might want to create our own Darwin Core Archives without using the IPT. Therefore, it would be useful to have a template for people to fill in, which I can write a script to harvest the metadata from if I want to create a DwCA myself.

I have volunteered to essentially recreate the IPT metadata form for him using javascript.

dagendresen commented 2 years ago

Might be useful to read the terms in from rs.gbif.org -- in case there are changes to the terms (so maybe even crawl the different versions of the GBIF EML schema to find the most recent one...).

I guess the use case is to have a template to bring into the field (where there might be no Internet).

rukayaj commented 2 years ago

There are also some properties file with the EML fields used the IPT + the translations here: https://github.com/gbif/ipt/tree/master/src/main/resources - e.g. ApplicationResources_en.properties

lhmarsden commented 2 years ago

The use case is more to facilitate creating a DwCA when one does not have access to an IPT, or when it may need to be customised (for example if I am creating an eMoF extension or ResourceRelationship extension).

We will also be looking to incorporate a similar functionality into our sample log template generator: https://sios-svalbard.org/cgi-bin/darwinsheet/?setup=aen

I will be updating the 'metadata' sheet that is generated with each file to include either:

  1. EML fields, descriptions and whether each is required or recommended.
  2. ACDD metadata, for cases where the ultimate destination of the data will be a NetCDF-CF file rather than a DwCA, since we collect a lot of physical data in Nansen Legacy too. https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3

However, most of the Nansen Legacy cruises have already been completed, and the sample logs filled in without these additions. Therefore, when someone already has the data, and doesn't have access to an IPT, it will be beneficial to be able to direct someone to the "IPT metadata form" that @rukayaj is working on.

I suppose it would also be a useful form to send to people who provide you with an spreadsheet/CSV file etc. and request that you at GBIF (or any other data centre for that matter) convert them to DwCA for them? Therefore, hosting this online somewhere could be beneficial. Or perhaps I have misunderstood how these interactions usually work.

rukayaj commented 2 years ago

I suppose it would also be a useful form to send to people who provide you with an spreadsheet/CSV file etc. and request that you at GBIF (or any other data centre for that matter) convert them to DwCA for them? Therefore, hosting this online somewhere could be beneficial. Or perhaps I have misunderstood how these interactions usually work.

Well, we actually usually provide people with a user account with basic rights on our IPT and ask them to fill the metadata in on there, which is pretty easy for them to do and so far I have never come across anyone who has a problem doing it.

So I am not sure if we will actually be using it for our users. But it sounds like your users will not actually ever come into contact with an IPT, right? Although I see in your other email you do say the data is ultimately going to get published via an IPT?

At any rate I think it is possible that this might be useful for others, as we have quite a few people using the dwc occurrence and event core data spreadsheet generator that is online.

lhmarsden commented 2 years ago
So I am not sure if we will actually be using it for our users. But it sounds like your users will not actually ever come into contact with an IPT, right? Although I see in your other email you do say the data is ultimately going to get published via an IPT?

We currently don't have an IPT server installed. Our users may someday be able to get an account and log in, but can't as of yet. It was also my impression that not all DwCA can be created using the IPT, if they for example have column headers that are not Darwin Core Terms, or include some extensions that currently aren't recognized. Is this the case?

rukayaj commented 2 years ago

Well, the extensions would at a minimum need to be in the rs.gbif.org sandbox. But yes, if there are column headers which are not Darwin Core Terms then they would be excluded from the DwCA that the IPT creates.

But can you really say it is a DwCA if it's not using DwC terms? What column headers are you thinking about which won't fit in? There are plenty of flexible extensions like Resource Relationship and flexible fields like dynamicProperties which can take most stuff I would think?

lhmarsden commented 2 years ago

For example dwciri:measurementType or measurementTypeID in a measurementorfact extension or extendedmeasurementorfact extension.

rukayaj commented 2 years ago

No, both of those should be included in the dwca file, if you use the extendedmeasurementorfact. Extensions get added to dwcas, you can see an example of measurementorfact if you download the dwca for this resource https://ipt.gbif.no/resource?r=radiolaria. Or perhaps I'm misunderstanding what you're saying?

lhmarsden commented 2 years ago

As far as I understand, it is not possible to create a DwCA using these column headers using the IPT

dagendresen commented 2 years ago

Yes the dwciri MoF terms are only in the rs.gbif.org/sandbox -- we should remind the GBIF Secretariat developer to prioritize moving these into production :-)

dagendresen commented 2 years ago

The IPT in demo mode can build DwCA with these sandbox extensions -- however, before moving these data definitions to production I am unsure how useful this actually is...

lhmarsden commented 2 years ago

Okay.

I will try to encourage researchers to use Darwin Core terms wherever possible - and they mostly will. However, I can also envisage that sometimes people will want to add miscellaneous columns too. Even if I would rather they stuck to dynamics properties or MoF terms.

Agree, without the definitions that is less helpful.

rukayaj commented 2 years ago

I just added a test ExtendedMeasurementOrFact file here - https://ipt-test.gbif.no/resource?r=testforsios - so you can see it is included in the dwca download.

ExtendedMeasurementOrFact is also in production I think?

If people add miscellaneous columns it's usually not such a hassle to map them onto something else or convert them into an extension file.

dagendresen commented 2 years ago

I added a question on the status for moving the sandbox MoF to production here: https://github.com/gbif/rs.gbif.org/issues/80#issuecomment-986859243

rukayaj commented 2 years ago

@lhmarsden, @MichalTorma and I had a discussion and I think we settled on it being probably possible in many cases for Nansen Legacy researchers to have access to an IPT where they could create the EML file before going on a trip (there is no internet connectivity while actually on a vessel).

@lhmarsden did say that there is some metadata generated while actually on the vessel, and gave an example of different people taking different samples at different times for different reasons - it sounds like this isn't always 100% mapped out beforehand and can happen on an ad-hoc basis. So in this case an offline form would be useful.

Additionally, some of the Living Norway group have said they are also interested in this tool.

So, with that in mind, there is now a version of it on github pages https://gbif-norway.github.io/eml-generator-js/ ready for testing. In order for it to be used offline you have to to just save the page locally (e.g. using ctrl + s).

There is a link on the DWC Excel template generator to this EML form, and I have added google analytics tracking so we can see if there is any uptake (note: this will obviously not work for those who use it offline). I am going to close the issue here and we can add issues for this tool to https://github.com/gbif-norway/eml-generator-js.

Next steps / improvements

It's made using the json schema standard + jsonforms.io (react). The form schema is hard coded, but should really ideally be generated from the EML schema so it can be easily updated whenever there are changes. It should also really read labels + help text from the IPT translations so we can easily update it there too. Another big technical improvement would be to add some end-to-end tests.

It probably needs save/load functionality added as well, as the form is so long and exhaustive to fill in. Right now to make it easier to test it doesn't have required field validation. It's also missing a few fields, and generates some by default (e.g. language, which I think we should always just make Norwegian researchers to fill out in English as everyone here as such good English skills). I actually think it would benefit from further simplification and some more fields removed.

dagendresen commented 2 years ago

see also https://github.com/gbif/ipt/wiki/resourceMetadata