OHDSI / GIS

https://ohdsi.github.io/GIS
Apache License 2.0
8 stars 8 forks source link

[Use Case]: Evaluating fitness of WorldFAIR for OHDSI/GIS #324

Open kzollove opened 4 months ago

kzollove commented 4 months ago

Project Lead:

@jaygee-on-github

Purpose:

This is the specification we will be evaluating to determine its fitness to purpose:

DiscoverabilityDraftForZenodo.pdf

The specification proposes some metadata content that we can use to mark up any digital object for the purpose of discovery. The metadata content has been taken from many standards including Dublin Core, ISO19115-1, schema.org conventions from ESIPFed Science on Schema.org and Ocean Data net, DCAT, DCAT-AP, and FDO Kernel Attributes-2.0.

The specification maps this content into a set of JSON-LD nodes in a knowledge graph. Each node has a property and ultimately a value taken from the use case. The knowledge graph is machine readable and can be queried by a software agent. It can also be validated using SHACL rules in specific use cases.

One use case for this specification is a catalog of datasets. In this context the specification provides mark up and a knowledge graph at both the dataset and the variable levels. Variable level metadata can be more or less advanced.

Tasks:

jaygee-on-github commented 4 months ago

In the tasks so far I didn't include development of an upper model based on Wild's exposome that we can use to classify all the catalog entries at the dataset level.

This appears in a presentation I made recently:

image

Here is the presentation

If we had an "upper model" that we could use as buckets in which to break out the datasets, then we would be positioned to create a catalog with three levels following the Arcus schema that the library science group developed at CHOP. In the Arcus model a catalog consists of one or more collections and a collection contains one or more series and a series consists of one or more files/datasets. INSPIRE has begun to engage with the Arcus group at CHOP at least conceptually.

Here is a presentation they recently made to INSPIRE:

kzollove commented 3 months ago

Jay, Doug, and Steve have built a schema.org JSON-LD from LinkML (In different context ). They will meet separately to detail that pipeline and then will update the task to push non-functional metadata into JSON-LD after this meeting

There are other apps that run around that process that may be helpful. Will detail these (db schemas, documentation)

DB tables/ schemas for capturing this metadata can be generated from schema.org JSON-LD. Natural language descriptions can be generated to describe these tables

Doug is exploring using graphs to analyze these

kzollove commented 3 months ago

This Use Case is contributing directly to GIS WG by developing Authoring environment for discovery metadata that will go into staging database alongside catalog entries

jaygee-on-github commented 3 months ago

@kzollove, we met on Tuesday. Tim, Doug Fils, Arofan Gregory and Jay were in attendance. We discussed metadata entry using YAML and forms. Tim demonstrated a recently developed DataCite form called DataCite Fabrica. We discussed middleware that would take us to JSON-LD and schema.org. Candidates included LinkML and RML.io technologies.

Doug is going to put together a preliminary proposal working with Tim and the various approaches Tim has either used or wants to consider for the metadata entry. I will check with Doug later this week before our Friday meeting on 4/5 to find out our ETA on the proposal.

jaygee-on-github commented 2 months ago

@kzollove and @martyalvarez and @AEW0330 and @tibbben and @rtmill, we would like to present next week. We have two candidate authoring solutions. Both will support YAML or spreadsheet input and JSON-LD output right now at the dataset level but extensible to the variable level.

The output is an empty instance of schema.org JSON-LD that can be aligned with any standard (more or less).

In one candidate the mapplng is embedded in some code probably Python if I recall. In the other candidate the mapping is declarative.

We might want to talk about the maintainability of the two candidates.

jaygee-on-github commented 2 months ago

The design for the output schema.org is a little open-ended as a feature. We have experience with and are interested in following the Science on Schema.org metadata guidance endorsed by the ESIP Partner Assembly a couple of years back. This guidance is remarkably cross-domain.

The guidance can be found here. Note that some of the guidance is experimental developed to address a few special use cases. We are thinking the experimental guidance may apply.

kzollove commented 1 month ago

@jaygee-on-github, once you find a time that works for you and Doug Fils (and whoever else should be present), please let us know and @martyalvarez can help set up the presentation on this work.

My preference is for a Friday meeting, but will join whenever! Thanks for all your work on this.

fils commented 1 month ago

Look forward to talking about these on the scheduled call.

Obviously YAML to JSON-LD (RDF) is doable, but so is CSV or just tabular data to RDF. I've been exploring RML (https://rml.io/) which allows for a declarative mapping from tabular (or structured) to RDF. This would let people work in spreadsheets if they like and that maps better to their current data model.

A forms based approach could also be used. Things like https://www.kobotoolbox.org/ are also possible alternatives to classic Google Forms.

Connecting such transforms with validation via SHACL is another topic that might be of interest.

I'll work up examples for the May 17th call.

fils commented 2 weeks ago

@kzollove @jaygee-on-github just FYI, we finally published the latest version of the document referenced in the original post on this thread. You can find it here: https://zenodo.org/records/11236871

During the editing of this document I was always keeping in mind how I would connect the UNESCO Ocean InfoHub (OIH) work to these guidelines. Part of the groups follow on work is start looking at implementation examples and documenting those. So, I'm happy to look them over in the context of this work as well as OIH.

Note that guidance scopes the use of https://schema.org/StatisticalVariable along with the standard https://schema.org/variableMeasured.

I am also meeting with the OBIS group (https://obis.org/) next week to talk about how we could align some of the discrete grid approaches we are working on. OBIS is developing what they call speciesgrids (https://github.com/iobis/speciesgrids) and I have been working on a similar to generate resources like the following.

image

I am hoping we can generate these products in line with the CODATA recommendations.

jaygee-on-github commented 2 weeks ago

Thanks, Doug JaySent from my iPhoneJay Greenfield202.271.3179On Jun 13, 2024, at 3:07 PM, Douglas Fils @.***> wrote: @kzollove @jaygee-on-github just FYI, we finally published the latest version of the document referenced in the original post on this thread. You can find it here: https://zenodo.org/records/11236871 During the editing of this document I was always keeping in mind how I would connect the UNESCO Ocean InfoHub (OIH) work to these guidelines. Part of the groups follow on work is start looking at implementation examples and documenting those. So, I'm happy to look them over in the context of this work as well as OIH. Note that guidance scopes the use of https://schema.org/StatisticalVariable along with the standard https://schema.org/variableMeasured. I am also meeting with the OBIS group (https://obis.org/) next week to talk about how we could align some of the discrete grid approaches we are working on. OBIS is developing what they call speciesgrids (https://github.com/iobis/speciesgrids) and I have been working on a similar to generate resources like the following. image.png (view on web) I am hoping we can generate these products in line with the CODATA recommendations.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>