Entry Points for RO-Crate Profiles

floWetzels commented 2 weeks ago

In the current specification, profiles are always defined on a complete RO-Crate. In case that such profile specifies requirements on the root data entity, they cannot be used in a modular or composable way, since subdirectories in RO-Crates don't specify a root data entity. An example for such a profile is the Workflow-Run-RO-Crate.

This issue came up at the BioHackathon Europe 2024, project 19 (@elichad @dnlbauer @floWetzels). Since composed research objects seem to be common and it should be possible to model them in RO-Crates without redundance in profiles, we propose the introduction of a Entrypoint mechanic into the RO-Crate specification, see https://github.com/dnlbauer/bh24-ro-crate-extension for details (will be fleshed out in the future).

Subscribe to this issue to stay updated on the development.

elichad commented 1 week ago

To consider:

can/should this work recursively? Having an entry point inside another entry point?
what should happen if multiple entry points conform to the same profile? e.g. in the case of uploading a crate with multiple workflow entry points to WorkflowHub

dnlbauer commented 1 week ago

what should happen if multiple entry points conform to the same profile? e.g. in the case of uploading a crate with multiple workflow entry points to WorkflowHub

In this case, the service could decide to decompose the RO-Crate into "atomic" (for the lack of a better term) RO-Crates which can be handled like normal RO-Crates. I.e. WorkflowHub could decide to create multiple entries - one for each entry point in the crate; a workflow execution engine executing a workflow from RO-Crate could present a drop down selection to the user or require to specify an entrypoint during workflow submisson.

elichad commented 1 week ago

Intending to discuss this issue at the RO-Crate community call at 8:00 UTC tomorrow

marc-portier commented 1 week ago

Missed the meeting and thus the discussion in the call. Still sharing my two cents.

From a semantic point of view there is nothing preventing you to declare additional triples with the dcterms:conformsTo-predicate attached to any available part (subject) in the graph. And I don't think the ro-crate spec is formulating any restriction on that either. If anything, the jsonld-context just makes it handy to use conformsTo: keys in the json-ld to add these.

Its value is expected to contain an identifier (URI) for a standard, that

just conceptually represents a number of assumptions clients can make about the subject
allows those clients to verify if they have the knowledge on board to deal with that

As such one could be using the conformsTo in ro-crates in combination with

data entities of type File to express e.g. the file is not just a netcdf file but conforming to th cf-conventions, or even a CSV file that sticks to some layout or schema, ...
conceptual entities describing dataservices that e.g. conform to some webserrvice api standard (like ogc-wms, erddap, ...)

This way of applying dcterms:conformsTo exists outside the RO-Crate concept and can be applied to any part of it as far as I see. The fact that the RO-Crate specification additionally introduced some specific suggestions to express conformity of ro-crates was considered as a useful and clear mechanism to guide people into some kind of "duck-type" declaring of valid assumptions on the crate contents. The fact RO-Crate 1.2 introduces some guidance on this level

does not in any way limit other usage of this mechanism (including the suggested nesting)
nor should it raise the expectation that because of that the RO-Crate specification suddenly needs to control, document, or worse: forbid any more nested/detailed application of that same mechanism.

If anything, IMHO the RO-Crate spec should state it does deliberately not want to interfere with that detail level. And

elichad commented 1 week ago

Summarising discussion from community call 2024-11-14:

Using the EntryPoint type is not strictly required since it could be inferred from checking if any conformsTo on an entity is an RO-Crate profile. However it is convenient (for tooling) to make entry points explicit within the crate metadata.
Using @type to indicate entry points may not be the best choice, as @type usually describes what the actual thing represented by the entity is (e.g. a File, a Person, a Place), and an entry point is just a construct in the metadata
- the existing EntryPoint type in schema.org is intended for describing API endpoints and such, this isn't quite the same as our idea, we wouldn't want someone to describe an API using RO-Crate and end up with weird conflicts because of the overloaded type
- we could find a different property to use (but we haven't found a good one so far)
- in our example we also included the entry points under "about" in the metadata entity, which is another way to make them easily discoverable by tools
- ISA profile uses additionalType to indicate Investigation, Study, Assay https://github.com/nfdi4plants/isa-ro-crate-profile/blob/release/profile/isa_ro_crate.md - potentially do something similar to indicate entry points?
- Do we need a “Crate” type?
Highlighting of entrypoints GUI wise - they are possible views of the crate
Profiles often talk about the root crate - the entrypoint would be a mechanic for profiles to talk about their root without necessarily that being the RO-Crate Root
- Want to make RO-Crate profiles more like mix-ins that don’t require being in an RO-Crate
- https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/data-entities.html#referencing-other-ro-crates uses conformsto and type Dataset to indicate an external crate. But what if it only conforms to a profile without being a RO-Crate?
- RO-Crate V2 is moving towards "fragments". Also does not have to be a Dataset that is the “root”. Increasingly profiles of fragments can then be used. This idea fits well with that vision
If we add EntryPoint there may be only be certain properties that we should follow recursively to scope the “sub-crate” e.g. hasPart, mentions, mainEntity. But what links NOT to follow?
Alternative of using named @graph { fragments } to isolate the scope? Can get quite complicated.

ResearchObject / ro-crate

Entry Points for RO-Crate Profiles #371