ResearchObject / ro-crate

Research Object Crate
https://w3id.org/ro/crate/
Apache License 2.0
79 stars 34 forks source link

How to reference and retrieve another RO-Crate #296

Open stain opened 2 months ago

stain commented 2 months ago

This PR fixes #228 #160

Generalizes the Content-negotiate-or-signposting section from not just Profile Crates.

For ZIP files this is still vague in that it says If the retrieved resource is a ZIP file (Content-Type: application/zip), then extract ro-crate-metadata.json, or, if the archive root only contains a single folder (e.g. folder1/), extract folder1/ro-crate-metadata.json
I've also added BagIt reference as this would be a second folder, e.g. folder1/data/ro-crate-metadata.json and then the checksums should be verified first as we do in https://trefx.uk/5s-crate/0.4/#check-phase

As for referencing another RO-rate from another, either the referenced RO-Crate can have its own distribution with a conformsTo:

  {
    "@id": "./",
    "@type": "Dataset",
    "identifier": "https://doi.org/10.48546/workflowhub.workflow.775.1",
    "url": "https://workflowhub.eu/workflows/775/ro_crate?version=1",
    "name": "Research Object Crate for Jupyter Notebook Molecular Structure Checking",
    "distribution": {"@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1"},
    "…": ""
  },
  {
    "@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1",
    "@type": "DataDownload",
    "encodingFormat": ["application/zip", {"@id": "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263"}],
    "conformsTo": { "@id": "https://w3id.org/ro/crate" }
  }

or it can have a subjectOf to a ro-crate-metadata.json:

{
  "@id": "http://example.com/another-crate/",
  "@type": "Dataset",
  "conformsTo": { "@id": "https://w3id.org/ro/crate" },
  "subjectOf": { "@id": "http://example.com/another-crate/ro-crate-metadata.json" }
},
{
  "@id": "http://example.com/another-crate/ro-crate-metadata.json",
  "@type": "CreativeWork",
  "encodingFormat": "application/ld+json"
}

As used by the 5s-crate profile: https://trefx.uk/5s-crate/0.4/#referencing-a-workflow-crate

stain commented 1 month ago

Could @dgarijo or @ptsefton have a look at this? I've used it here: https://stain.github.io/workflow-run-crate/profiles/0.5-DRAFT/process_run_crate/ro-crate-preview.html#https%3A//www.researchobject.org/workflow-run-crate-paper/mapping/

Perhaps we should add that isPartOf pattern as well on how to mention a file within another crate? (Could get tricky to make absolute URIs..)

dgarijo commented 1 month ago

Will do when I finish the ISWC reviews that are due tomorrow :(((

El lun., 20 may. 2024 9:14 p. m., Stian Soiland-Reyes < @.***> escribió:

Could @dgarijo https://github.com/dgarijo or @ptsefton https://github.com/ptsefton have a look at this? I've used it here:

https://stain.github.io/workflow-run-crate/profiles/0.5-DRAFT/process_run_crate/ro-crate-preview.html#https%3A//www.researchobject.org/workflow-run-crate-paper/mapping/

Perhaps we should add that isPartOf pattern as well on how to mention a file within another crate? (Could get tricky to make absolute URIs..)

— Reply to this email directly, view it on GitHub https://github.com/ResearchObject/ro-crate/pull/296#issuecomment-2121046774, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALTIGT5ZN6HYRP3WSEQEI3ZDJDQLAVCNFSM6AAAAABGZZMP2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRRGA2DMNZXGQ . You are receiving this because you were mentioned.Message ID: @.***>

dgarijo commented 1 month ago

Thanks @stain , I have had a look. The only thing that it is not fully clear to me is where the distribution information is supposed to be added: is it on the crate referencing the other crate, or in the referenced crate metadata?

For example, let's say crate A references crate B. Usually I would add a link in A to B. But here you recommend adding also where B is stored, correct? As opposed to adding a link to B, and hoping that when I resolve the id I get a JSON-LD with the distribution information.

The only potential issue I see is that distributions may not have persistent ids. If the link from A to B persists, but the distribution is hosted elsewhere in the meantime, B has no means to tell A about this. But I am ok with this limitation