ResearchObject / ro-crate

Research Object Crate
https://w3id.org/ro/crate/
Apache License 2.0
88 stars 36 forks source link

Use Case: Use LinkML to define schemas #264

Open glyg opened 1 year ago

glyg commented 1 year ago

As a research software engineers, I want to use LinkML so that I can use the tooling associated to maintain my RO-crate specification and automate data export.

Hi, I am working on the deployments of services to manage Microscopy images across institutions, we mainly use OMERO to manage the data, but we miss a common representation for microscopy (meta)data records, for interoperability with other tools.

Thus, I would like to use RO-crate to define the record structure. On the other hand, the community is pushing on using linkML as a schema definition tool, with the hope it will ease the combination of various metadata recommendation sources.

It seems that using linkML to define / create the RO-Crate could be a good entry point, and benefit both communities.

Have someone done that already?

I am now trying to define the RO-Crate schema in linkml and use it to produce ro_crate_metadata.json (this is not working atm), with the command:

linkml-convert -s ro-crate-schema.yml -t rdf data.yml -o ro_crate_metadata.json

Here is what the inputs look like:

(incomplete) Ro-Crate schema in linkML (ro-crate-schema.yml):

id: https://w3id.org/ro/crate/1.1
name: ro-crate-linkml
prefixes:
  linkml: https://w3id.org/linkml/
  schema: http://schema.org/
  ro_crate: https://ro/crate/1.1
  ORCID: https://orcid.org/
imports:
  - linkml:types
default_curi_maps:
  - semweb_context
default_prefix: ro_crate
default_range: string

classes:
  Thing:
    class_uri: schema:Thing
    attributes:
      id:
        range: uriorcurie
      description:
        range: string

  CreativeWork:
    is_a: Thing
    class_uri: schema:CreativeWork
    attributes:
      conformsTo:
        range: uriorcurie
      about:
        range: uriorcurie

  DataEntity:
    is_a: Thing

  Dataset:
    is_a: DataEntity
    class_uri: ro_crate:Dataset
    attributes:
      hasPart:
        range: DataEntities

  RootDataEntitiy:
    is_a: Dataset
    tree_root: true

  File:
    is_a: DataEntity
    class_uri: ro_crate:File
    attributes:
      name:
        range: string
      contentSize:
        range: string
      encodingFormat:
        range: string

      sdDatePublished:
        range: string # should be isoformat date

  DataEntities:
    description: >-
      A list of Datasets and Files
    attributes:
      entries:
        range: DataEntity
        multivalued: true
        inlined: true

  Person:
    class_uri: schema:Person              ## reuse schema.org vocabulary
    attributes:
      id:
        identifier: true
      full_name:
        required: true
        description:
          name of the person
        slot_uri: schema:name             ## reuse schema.org vocabulary
    id_prefixes:
      - ORCID

Example data (not working) data.yml:

- id: ro-crate-metadata.json
  type: CreativeWork
  conformsTo:
    id: https://w3id.org/ro/crate/1.1
- id: ./
  type: RootDataEntity
  hasPart:
    - id: cp7glop.ai
      is_a: File
      name: "Diagram showing trend to increase"
      contentSize: "383766"
      description: "Illustrator file for Glop Pot"
      encodingFormat: "application/pdf"
    - id: lots_of_little_files/
      is_a: Dataset
      name: "Too many files"
      description: "This directory contains many small files, that we're not going to describe in detail."

Any thought, pointer or hint on how to achieve that is welcome! (I am referencing this in a LinkML issue)

Thanks :) Guillaume

stain commented 1 year ago

Thanks for the suggestion! This fit well into what we're proposing for profiles https://www.researchobject.org/ro-crate/1.2-DRAFT/profiles as well.

Bioschemas have tried using DDE https://github.com/BioSchemas/specifications/ which is also related.

LinkML seems quite approachable to edit compared to SHACL and ShEX - see also some thoughts on those in https://github.com/ResearchObject/runcrate/pull/17

As for File you should define it as http://schema.org/MediaObject (aka schema:MediaObject with your prefixes) - we don't have any ro_crate terms. The mapping for additional terms from our context is clarified in https://www.researchobject.org/ro-crate/1.2-DRAFT/metadata.html#additional-metadata-standards

stain commented 1 year ago

You should also use the filename ro-crate-metadata.json as normated by https://www.researchobject.org/ro-crate/1.1/structure.html - not come across the underscore version before - some tools may require .jsonld suffix to generate.

glyg commented 1 year ago

Thanks a lot for the feedback @stain I'll fix those (underscores are a typo on my part) I'll try to make some progress and come back here

glyg commented 11 months ago

Hi, sorry for the stale issue. I posted a brief review of my attempt on forum.image.sc

It's OK for me to close this if you feel it clutters your repo :), although I still feel something should be done but I have no clue how...

stain commented 3 months ago

Some more discussion in https://forum.image.sc/t/ro-crate-and-omero/80610/11