Closed dgarijo closed 1 month ago
https://signposting.org/adopters/#workflowhub documents how we do this with Signposting in WorkflowHub. Could we generalize this?
Let's make a new section for Retrieving RO-Crate and move out some of the content-negotiation described in https://www.researchobject.org/ro-crate/1.2-DRAFT/profiles#how-to-retrieve-a-profile-crate
to perhaps allow both for application/zip
and application/ld+json
.
We can then add signposting particularly where the persistent identifier has a HTML landing page (which may be ro-crate-preview.html
as suggested by Profile Crate) -- see #160
See also #149
Not sure we should close this, as we don't detail what to expect in the zip file.
@dgarijo -- is the text in https://www.researchobject.org/ro-crate/1.2-DRAFT/root-data-entity.html#root-data-entity-identifier sufficient for 1.2 to close this?
Here's one take with BagIt: https://trefx.uk/trusted-wfrun-crate/0.3/#archive-serialisation which assumes a single folder (with arbitrary name) that again contains bagit.txt
and manifest-sha512.txt
with checksums and then data/ro-crate-metadata.json
-- I'm trying to formalize this into an update of https://github.com/ResearchObject/bagit-ro profile but it is mostly already in https://www.researchobject.org/ro-crate/1.2-DRAFT/appendix/implementation-notes.html#adding-ro-crate-to-bagit
Then there is Workflow RO-Crate has a different take where the Zip file has not got a top level directory at all (that is ro-crate-metadata.json
and other files are directly in ZIP root). This is easy to access programmatically, but may give some classical unzip
users a surprise as the current directory will be filled with multiple files. (I think the Windows/macOS integrations will make a folder for you)
ROHub also exports directly with ro-crate-metadata.json
in the root.
As I listed in https://trefx.uk/trusted-wfrun-crate/0.3/#zip-expectations certain ZIP features should not be used, e.g. multipart (for floppies!), ZIP64 extensions are needed for larger than 2 GB, etc. These are documented fairly well in https://www.w3.org/publishing/epub32/epub-ocf.html#sec-zip-container-zipreqs
I start thinking that we need multiple profiles depending on if it's a bagit-wrapping ZIP, a "plain" RO-Crate, or a detached RO-Crate JSON-LD..
A ZIP archive with ro-crate-metadata.zip
in the root:
Link: <https://example.com/workflows/419/ro_crate.zip> ;
rel="item" ;
type="application/zip" ;
profile="https://w3id.org/ro/crate#archive"
(or make a new w3id PID space for that)
A bagit zip according to https://www.researchobject.org/ro-crate/1.2-DRAFT/appendix/implementation-notes.html#adding-ro-crate-to-bagit aka foo-something/data/ro-crate-metadata.json
:
Link: <https://example.com/workflows/419/bagit.zip> ;
rel="item" ;
type="application/zip" ;
profile="https://w3id.org/ro/bagit/profile/0.3"
An RO-Crate Metadata Document straight on the web (Detached or Attached):
Link: <https://example.com/workflows/419/ro-crate-metadata.json> ;
rel="item" ;
type="application/ld+json" ;
profile="https://w3id.org/ro/crate"
And then only the final one corresponds to the profile registered in https://www.iana.org/assignments/profile-uris/profile-uris.xhtml as a JSON-LD profile.
In either case, when retrieving, the profile will be provided as a Link
as described in https://trefx.uk/trusted-wfrun-crate/0.3/#media-type-and-profiles
GET http://example.com/crates/42.zip HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/zip
Link: <https://w3id.org/ro/crate#archive>; rel="profile"`
Or from a landing page, with signposting as above:
HEAD http://example.com/crates/42.html HTTP/1.1
HTTP/1.1 200 OK
Content-Type: text/html
Link: <https://example.com/query-12389.zip>; rel="item", type="application/zip"
Link: <https://w3id.org/ro/crate>; rel="profile"; type="application/zip";
anchor="https://example.com/query-12389.zip"
Hmm, you may be correct, although it complicates things a little.
From my end, I am interested in knowing what to prepare when someone asks for one of my ROs with permanent ids.
For example https://w3id.org/dgarijo/ro/sepln2022
i set up json-ld (ro-crate metadata file) and the HTML. But I did not find a recommendation on how to create the zip file when I last browsed the spec.
The text in https://www.researchobject.org/ro-crate/1.2-DRAFT/root-data-entity.html#root-data-entity-identifier points me to https://www.researchobject.org/ro-crate/1.2-DRAFT/profiles.html#how-to-retrieve-a-profile-crate, but it is not clear how I should structure the contents of the zip file.
Also, should my root data entity contain a link to the zip file with the downloadable ro-crate? maybe using the schema.org distribution properties used for datasets.
As a programmer, I want to obtain the aggregated contents of a Research Object as a downloadable resource.
Ideally, I would like to do so through a request and content-negotiation. But I do not see an agreement about how to serve the RO-Crate itself. Can we agree into something like
application/zip
? Can we have some community-agreed guidelines?