ResearchObject / ro-crate

Research Object Crate
https://w3id.org/ro/crate/
Apache License 2.0
87 stars 34 forks source link

Use Case: indicating that a Data Entity conforms to a particular specification #115

Open paulwalk opened 3 years ago

paulwalk commented 3 years ago

As a general user of RO Crates, I want to include Data Entities that conform to a particular schema or standard and are identifiable as such (i.e. the resource conforms to specification a), so that I can determine if a given RO-Crate contains a Data Entity which conforms to specification a.

Example

I want to include a machine Actionable Data Management Plan (maDMP) in my RO-Crate, where the maDMP will conform to the specification described here.

I may (but not necessarily) want to provide a URL to the "live" maDMP, and/or I may (but not necessarily) want to provide a serialised copy or "snapshot" of that maDMP (as a single file). Essentially, the maDMP may be "passed by reference" and/or "passed by copy".

From the RO-Crate documentation, I refer to the Embedded data entities that are also on the web section, which advises something like:

  {
    "@id": "my_maDMP.json",
    "@type": "File",
    "name": "My Groovy Machine Actionable Data management Plan",
    "encodingFormat": "application/json",
    "url": "http://example.com/downloads/my_maDMP.json",
    "subjectOf": {"@id": "http://example.com/ma_dmps/my_maDMP.html"}
  }

This then allows both the serialised maDMP and the URL to the remote resource to be included, which is great. What I'm unclear about is how I would indicate (in a machine readable way) that this Data Entity is a maDMP conforming with this particular specification.

I guess what I'm looking for is a generalised approach, something like as above but adding in:

  {
    "conformsTo": {"@id": "https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard"}
  }

However, there does not seem to be anything like "conformsTo" in the RO-Crate context, and schema.org does not have http://schema.org/conformsTo either. This is probably because dcterms:conformsTo exists already.

So, one solution might be to add a mapping in the RO-Crate context, something like:

"conformsTo": "http://purl.org/dc/terms/conformsTo",

What do you think?

(Disclaimers)

  1. I am Director of the DublinCore Metadata Initiative
  2. I am co-chair of the RDA DMP Common Standard Working Group
TomMiksa commented 3 years ago

Was there any discussion/decisions on this topic that have happened outside of this thread?

We have been exploring other use cases for integration between roCrates and maDMPs (#88), but the one described by Paul in this issue seems to be "the easiest" and "the cleanest".

I am very interested to explore further this use case.

stain commented 3 years ago

We discussed this in the call today with regards to profiles as well, and @marc-portier pointed out that the base level here is just to identify a profile by URI, which is the same as what @paulwalk is suggesting.

In workflows we recommend using conformsTo (from DC Terms) to indicate conformance with their the Bioschemas profile:

{ "@id": "workflow/alignment.knime",  
  "@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow"],
  "conformsTo": 
    {"@id": "https://bioschemas.org/profiles/ComputationalWorkflow/0.5-DRAFT-2020_07_21/"},
}

This mirror our own self-declaration on the RO-Crate Metadata file:


    {
      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.json",
      "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
      "about": {"@id": "./"},
      "description": "RO-Crate Metadata File Descriptor (this file)"
    }

Similarly in BCO RO-Crate we used conformsTo to indicate that another file in the crate follows the IEEE 2791 JSON Schema:

{
    "@id": "chipseq_20200910.json",
    "@type": "File",
    "name": "chipseq_20200910.json",
    "description": "IEEE 2791 description (BioCompute Object) of nf-core/chipseq",
    "conformsTo": {
        "@id": "https://w3id.org/ieee/ieee-2791-schema/"
    }
}

There is of course here a bit of overlap with encodingFormat so I think a section about conformsTo would fit in there, contrasting it. I would say the difference is that https://schema.org/encodingFormat says the file syntax - say "CSV", so it can be parsed in a certain way; while a profile is more open-ended and says how that CSV should be interpreted.

This is similar to profile= on IANA media types, see for instance JSON-LD where RO-Crate itself use and is a JSON-LD profile

Do you think @TomMiksa conformsTo would be sufficient as a start? This leaves open-ended the particular format of the profile - it could just be a HTML page as above.

Should the profile indicated also be further detailed as its own CreativeWork just like we do for license? This allows the profile document itself to be declared as being a particular format which could be a JSON Schema or something else formal.

stain commented 3 years ago

Side-tracked: Why conformsTo from DC Terms?

Defined as "An established standard to which the described resource conforms." pointing to a dcterms:Standard document.

There is nothing similar in schema.org, although https://schema.org/MediaObject and supertype https://schema.org/CreativeWork has:

paulwalk commented 3 years ago

I think the inclusion of the DMP by reference (URI) and using the conformsTo property to indicate that is it is one of "our" Common Standard DMPs is a good solution, in keeping with JSON-LD (and linked-data) conventions.

(On a slight tangent to this, I have re-opened the discussion about defining the DMP Common Standard as its own JSON-LD context).