Open simleo opened 2 years ago
so I decided to add the tools as
SoftwareApplication
entities with@id
packed.cwl#rev
andpacked.cwl#sorted
(whether this is correct is another matter: should they bepacked.cwl#main/rev
andpacked.cwl#main/sorted
instead?)
If used, it should be packed.cwl#main/rev
and packed.cwl#main/sorted
; there is neither a #rev
nor #sorted
in that document
Discussed at today's RO-Crate meeting:
Right, packed.cwl#main/rev
would be the way to refer to #main/rev
within packed.cwl
- CWL is unusual in that it has slash-based fragments, but this is also possible with XPath selectors for XML docs.
We could still add a section about referencing parts of other documents (which may even be contextual entities in another RO-Crate, some other Linked Data document, or just a section in a HTML/PDF), to clarify that you can use any URI/URI Reference with #
in identifiers of contextual entities.
There's a part about document section now as part of profiles, not quite right section for what this issue talks about. Uses WebPageElement
, see https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/data-entities.html#adding-detailed-descriptions-of-encodings
While converting a
cwltool --provenance
RO to a Workflow Run RO-Crate, I'm faced with the problem of referring to individual workflow steps. The workflow is stored in "packed" form, meaning that the tools that implement each step are stored in the samepacked.cwl
document as the workflow. For the packed form, CWL uses the URI fragment syntax to assign IDs to the steps and the workflow itself; in this case, they are:#main
#main/rev
#main/sorted
The workflow appears in the crate as a data entity with an
@id
ofpacked.cwl
, so I decided to add the tools asSoftwareApplication
entities with@id
packed.cwl#rev
andpacked.cwl#sorted
(whether this is correct is another matter: should they bepacked.cwl#main/rev
andpacked.cwl#main/sorted
instead?). Using fragments here seems quite reasonable, since the secondary resource is certainly "some portion or subset of the primary resource". However, should the tools be considered contextual entities or data entities? At first I tried to add them ad contextual entities:Leading to:
Which does not really seem to work, due to the leading
#
in the tool IDs (ro-crate-py automatically adds a leading hash mark to contextual entity IDs if they're not full URIs: I'm not sure this is a MUST in the RO-Crate spec, but it's at least implied), so I tried adding them as data entities:Leading to:
I think this is more correct since section IDs have a
document_id "#" fragment
structure. However, havingpacked.cwl#rev
andpacked.cwl#sorted
listed in the crate'shasPart
seems a bit weird. The current spec says "where files and folders are represented as Data Entities in the RO-Crate JSON-LD, these MUST be linked to, either directly or indirectly, from the Root Data Entity using the hasPart property". However, these are not files, but file sections, and would still be linked indirectly (viapacked.cwl
) if removed from the crate'shasPart
. Therefore, I think the spec should say that such "sections" MAY be listed.I've made use of the workflow step example throughout the above discussion, but it actually generalizes to referencing sections of a document of any kind, when the document is part of the crate.