globalbioticinteractions / carvalheiro2023

GloBI configuration to help index Luisa Carvalheiro, José Augusto Salim, Filipi Soares, Debora Drucker. 2023. WorldFAIR pilot data from: VisitationData_Luisa_Carvalheiro.
0 stars 0 forks source link

Update globi.json to include EML terms and license attributions (with examples) #2

Open zedomel opened 8 months ago

jhpoelen commented 8 months ago

from meeting 2023-11-27 https://docs.google.com/document/d/1MKUFLdGscFODvkW8NfzrP8LhJbDks3L0FXLyae6yZ34/edit

Filipi-Soares commented 8 months ago

Add basic description to the "Intelectual rights" metadata element; then map it to any Machine Readable license, as CC.

zedomel commented 8 months ago

Please see eml-jsonld.json It includes the term licensed with an url for the CC-BY license.

I'm wondering if it could serve as alternative for globi.json file as discussed in #1 .

:-D

jhpoelen commented 8 months ago

@zedomel thanks for the license example. In line with DataOne's @mbjones, I'd favor using eml.xml instead of the eml.json (in json-ld) representation. However, I am open to be persuaded otherwise.

Can we close this issue now that we have a license example?

zedomel commented 8 months ago

@jhpoelen Historically, EML has been serialized in XML. But I don't see any reasons to not allow JSON since they are convertible from one format to another.

I just think that creating JSON files by hand are easier than XML. But we can think about a (new/existing) tool or webform where users can create/edit metadata and generate JSON/XML in EML. Years ago, I used Morpho from NCEAS to document datasets when using Metacat, but I'm not sure if it is still in use and how simple/complex it is to produce such files.

What do you think?

mbjones commented 8 months ago

@zedomel Hello! It's been a long time -- great to hear from you (albeit indirectly).

Morpho has been deprecated for several years, and no longer maintained. We have replaced it with an open source, web-based metadata editor, MetacatUI, that produces EML and is maintained by @robyngit and our team at NCEAS. Under the hood, we have a javascript metadata model that is used in the web client, and could be more broadly exposed as JSON. See https://nceas.github.io/metacatui/

In addition, @cboettig wrote and we help support two R packages, one called EML that provides a high-level interface for creating and managing EML programatically, and the other emld that provides a JSON-LD model for EML and represents the underlying data model of EML in JSON-LD. emld manages eml metadata in a formal JSON-based graph model, and can serialize from and to both JSON and XML. As JSON-LD, we can treat the data as RDF and query it with SPARQL. There are, however, some challenges. JSON itself is not a schema language, and lacks the expressive constraints of XML Schema, and also differs from JSON-LD (for example, in how element ordering is handled). Consequently, there are a variety of perfectly-valid EML metadata documents that our current JSON-LD-based serialization does not handle correctly. Fixed ordering of creator lists and attribute lists is one of those challenging areas (JSON is basically a set of unordered name-value pairs). And, there are a number of challenges in handling escaped markup and markdown data (for example, see https://github.com/ropensci/EML/issues/315). But I think these challenges are all solvable.

And, regardless of those challenges, having a community-supported, cross-language JSON-LD serialization of EML I think would be a fantastic thing. MetacatUI could be extended to support JSON-LD serializations, as the serialization code is quite modular. If anyone wants to contribute, please get in touch and we're happy to get involved. Until there is wide support for it, though, its probably better to use the XML format for compatibility.

See also:

jhpoelen commented 8 months ago

@mbjones thanks for chiming in and adding context.

@zedomel Do you think it may perhaps a good idea to ship both eml.json / eml.xml ? If so, it'd be nice to have a way to automatically generate one from the other.

Aside from the neat R libraries, are you aware of any eml conversion tools accessible on the commandline?

zedomel commented 8 months ago

@zedomel Hello! It's been a long time -- great to hear from you (albeit indirectly).

Hi @mbjones it's great to hear from you too. I think you would like to know that @jhpoelen and I are collaborating with Debora Drucker. Well, Debora and I have been working together since we first met during the installation of PPBio Metacat.

Until there is wide support for it, though, its probably better to use the XML format for compatibility.

@jhpoelen ship both will be great, but if we need to choose, it's better to use XML.

I have used this python library to read and generate EML files:

It use xsdata for XML serialization, but xsdata can also produce JSON (JsonSerializer). Maybe we can implement a simple command line tool using this library to convert EML from/to XML/JSON.

What do you think?

I also have used https://ezeml.edirepository.org/eml/about as suggested by @mbjones. Probably it should be present in any guidelines we may produce in the WorldFAIR project as a tool for users to generate their own EML files.