This repository documents the analysis of Workflow Run RO-Crates (WRROC) converted from CWLProv RO Bundles using runcrate. The results of this analysis are also published on Zenodo: https://doi.org/10.5281/zenodo.12689424.
The analysis follows the same methodology as previous work, in which we conducted a qualitative evaluation of metadata coverage in CWLProv (version 0.6.0). This earlier analysis was based on concrete examples of ROs associated with a realistic bioinformatics workflow. Here, we repeated the analysis for Workflow Run RO-Crate, and compared the WRROC RDF representation (in ro-crate-metadata.json
) with the CWLProv RDF provenance graph.
We used the following approach and documented it in the Issues:
packed.cwl
, and primary-job.json
/primary-output.json
)).SoftwareRequirement
DockerRequirement
String
, File
, Directory
and File array
input parameters AND ResourceRequirement
Overview of the representation of each category of the provenance taxonomy, and its representation in RO-Crate. For a detailed explanation of each of the categories, see here: https://doi.org/10.5281/zenodo.7014950.
Explanation of the design of the workflow and its steps can be included in the CWL metadata fields (doc
, label
, intent
).
ro-crate-metadata.json
(RDF): full representationExplanation of the meaning of individual input/output data entities can be represented as structured annotations in the CWL input parameter file (not propagated to ro-crate-metadata.json
), but there is in the CWL standards v1.2 no clear guideline how to do these annotations.
ro-crate-metadata.json
(RDF): no representationWorkflow execution annotations (why was this combination of input parameters chosen?) can be represented as annotations in the CWL input parameter file (unstructured, not propagated to ro-crate-metadata.json
).
ro-crate-metadata.json
(RDF): no representationThis information can be added in the CWL input parameter file as structured annotations, but there is in the CWL standards v1.2 no clear guideline how to do these annotations. I
ro-crate-metadata.json
(RDF): no representationFilename, checksum are represented for all files, creation timestamps are available for output files. Additional structured annotations may be made in the CWL input parameter file. Filename and checksum are propagated to ro-crate-metadata.json
.
ro-crate-metadata.json
(RDF): partial representationThe CWL standards v1.2 allow specification of a remote location for data, which would serve as access to a downloadable form of the data.
ro-crate-metadata.json
(RDF): no representationMapping of input/output data to workflow parameters is represented in ro-crate-metadata.json
.
ro-crate-metadata.json
(RDF): full representationSoftwareRequirement
field is propagated to ro-crate-metadata.json
. SoftwareRequirement
contains specs
field with IRI, resolving to landing page with metadata about the tool (see CWL standards v1.2).
ro-crate-metadata.json
(RDF): full representationSoftwareRequirement
field is propagated to ro-crate-metadata.json
.
ro-crate-metadata.json
(RDF): full representationSoftwareRequirement
field is propagated to ro-crate-metadata.json
.
ro-crate-metadata.json
(RDF): full representationThe workflow itself (packed.cwl
) is contained in the CWLProv RO Bundle, as well as the RO-Crate produced by runcrate. Metadata/documentation about the workflow can be represented in CWL metadata fields (doc
, label
, intent
), which are propagated to ro-crate-metadata.json
. ro-crate-metadata.json
also contains a description of the workflow and all its parameters and steps. The representation of the workflow in CWLProv RDF is incomplete.
ro-crate-metadata.json
(RDF): full representationInformation about the workflow parameters can be represented in the CWL metadata fields (doc
, label
, format
).
ro-crate-metadata.json
(RDF): full representationThe CWL ResourceRequirement
field is partially propagated to ro-crate-metadata.json
(Scenario 4).
ro-crate-metadata.json
(RDF): partial representationAbsent.
Absent.
Container image is partially represented in CWL DockerRequirement
field, which is propagated to ro-crate-metadata.json
(Scenario 3).
ro-crate-metadata.json
(RDF): partial representationro-crate-metadata.json
(RDF): full representationAbsent.
ro-crate-metadata.json
(RDF): partial representationro-crate-metadata.json
(RDF): full representation