Closed simleo closed 2 years ago
I've tried to map all (hopefully!) CWL types, using test/data/type-zoo-run-1/snapshot/type_zoo.cwl
as a test case. Here are some notes on each:
Map to Text.
{
"@id": "#param-main/in_str",
"@type": "FormalParameter",
"additionalType": "Text",
"name": "main/in_str"
},
{
"@id": "#pv-main/in_str",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "#param-main/in_str"},
"name": "main/in_str",
"value": "spam"
}
Map to the element type (e.g., string[]
to Text). A property in RO-Crate can have a single value or multiple values, so there should be no need to do anything special here.
{
"@id": "#param-main/in_array",
"@type": "FormalParameter",
"additionalType": "Text",
"name": "main/in_array"
},
{
"@id": "#pv-main/in_array",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "#param-main/in_array"},
"name": "main/in_array",
"value": ["foo", "bar"]
}
If we want to be more specific, we could add a PropertyValueSpecification with multipleValues
set to True
. I'm not sure how to tie it to the parameter though -- I briefly skimmed through https://www.w3.org/wiki/images/1/10/PotentialActionsApril11.pdf, it looks like it could be relevant.
On a side note, it looks like ro-crate-py does not handle "value": ["foo", "bar"]
correctly, we need to check Entity
's magic item getter / setter.
I'd map this to DataType. However, in the converter I'm not currently parsing the workflow file, and the provenance files don't have this information: the type is reported as xsd:string
(i.e., the type of the actual value that was passed in the job config file) in the XML, and for now I'm just inferring the type from the deserialized JSON object anyway.
Map to Boolean.
{
"@id": "#param-main/in_bool",
"@type": "FormalParameter",
"additionalType": "Boolean",
"name": "main/in_bool"
},
{
"@id": "#pv-main/in_bool",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "#param-main/in_bool"},
"name": "main/in_bool",
"value": "True"
},
Conveniently, str(True)
yields "True"
, which represents True, and the same goes for False
. So we can simply use str
to serialize booleans.
Map to Integer.
{
"@id": "#param-main/in_int",
"@type": "FormalParameter",
"additionalType": "Integer",
"name": "main/in_int"
},
{
"@id": "#pv-main/in_int",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "#param-main/in_int"},
"name": "main/in_int",
"value": "42"
},
Note that the value
is serialized as a string: I followed https://schema.org/PropertyValue#eg-0404 (JSON-LD tab). Is there any relevant Schema.org recommendation anywhere? The RO-Crate spec should probably say something about this.
Map to Float, same considerations as int / long.
Map to array of mappings of each type, e.g., [int, float]
to ["Integer", "Float"]
. This info is not available from the provenance files, so for now the converter is reporting the type of the value passed in the job config.
What should we do about optional params, e.g., [int, "null"]
? Again, PropertyValueSpecification might be useful here, since it has a valueRequired
property.
Map to Text. Not sure if there's a way to specify a set of predefined allowed values.
Map to PropertyValue. It actually maps to an array of PropertyValue
s, but RO-Crate properties can have multiple values, so it's basically the same considerations we made for array
. To serialize the actual value of {"in_record_A": "Tom", "in_record_B": "Jerry"}
, I've used nested PropertyValue
s, with the record keys as additional slash-separated fields in the @id
:
{
"@id": "#param-main/in_record",
"@type": "FormalParameter",
"additionalType": "PropertyValue",
"name": "main/in_record"
},
{
"@id": "#pv-main/in_record",
"@type": "PropertyValue",
"exampleOfWork": {"@id": "#param-main/in_record"},
"name": "main/in_record",
"value": [
{"@id": "#pv-main/in_record/in_record_A"},
{"@id": "#pv-main/in_record/in_record_B"}
]
}
{
"@id": "#pv-main/in_record/in_record_A",
"@type": "PropertyValue",
"name": "main/in_record/in_record_A",
"value": "Tom"
},
{
"@id": "#pv-main/in_record/in_record_B",
"@type": "PropertyValue",
"name": "main/in_record/in_record_B",
"value": "Jerry"
},
In general, this is / should be available from the prospective provenance part (workflow file). For instance, packed.cwl
has entries like:
{
"source": "#main/input",
"id": "#main/rev/input"
}
Where "source" represents the workflow input and "id" the step input. However, in the case of files, primary.cwlprov.*
alone is sufficient to infer such mappings, since the two roles eventually map to the same artifact.
Merging so we get the rendering of the revsort example (should appear at https://www.researchobject.org/workflow-run-crate/examples/draft/revsort-run-1-crate/). We can do more work on this in future PRs
Adds a tool to generate a Workflow Run RO-Crate from CWLProv output. For now it's monolithic.