Closed simleo closed 7 months ago
Would using softwareRequirements coupled with mainEntity
make sense?
{ "@id": "foo.cwl",
"@type": "SoftwareApplication",
"name": "CWL wrapper of Foo",
"programmingLanguage": {"@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl"},
"softwareRequirements": [
{"@id": "http://example.com/foo"},
{"@id": "http://python.org/"}
],
"mainEntity": {"@id": "http://example.com/foo"}
},
{
"@id": "http://example.com/foo",
"@type": "SoftwareApplication",
"name": "Foo application"
},
{
"@id": "http://python.org/",
"@type": "SoftwareApplication",
"name": "Python language"
},
{
"@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl",
"@type": "ProgrammingLanguage",
"name": "Common Workflow Language"
}
Would using softwareRequirements coupled with
mainEntity
make sense?
I like this. I guess that gathering the required metadata programmatically would be very hard in general, though. The tool might not even be the first token in the command line (e.g., time samtools stats foo.bam
), and getting the dependencies involves a nontrivial search that requires knowing at least the exact version of the tool (this is true whatever representation we adopt). So this is something that the crate author would probably have to fill in "manually", unless it's supported somehow at the workflow language level.
Commented in the meeting minutes today, adding here for posterity too. StackStorm allow workflow devs to create Actions for commands, such as ping
, docker
, etc.
I think this, and maybe AirFlow operators, could serve as options to compare how tool wrappers are used in workflows.
-Bruno
As discussed at the 2023-02-16 meeting, in CWL, workflow devs can make things easier by specifying the SoftwareRequirement hint, which could be used to fill in softwareRequirements
. The mainEntity
, as discussed above, would be harder to determine programmatically, but it's still nice to have it in the model for those cases where it can be filled in.
Note that the "main dependency" would still not provide information on the wrapped executable. For instance, the equivalent of the CWL example linked above would be:
{
"@id": "interproscan.cwl",
"@type": "SoftwareApplication",
"name": "CWL wrapper of InterProScan",
"softwareRequirements": [
{"@id": "https://identifiers.org/rrid/RRID:SCR_005829"}
],
"mainEntity": {"@id": "https://identifiers.org/rrid/RRID:SCR_005829"}
},
{
"@id": "https://identifiers.org/rrid/RRID:SCR_005829",
"@type": "SoftwareApplication",
"name": "InterProScan",
"softwareVersion": "5.21-60"
}
Where there's no equivalent of baseCommand: interproscan.sh
. Also note that some packages install multiple executables.
In Workflow Run Crates, to use softwareRequirements
on the workflow, the SoftwareApplication
type needs to be added to ["File", "SoftwareSourceCode", "ComputationalWorkflow"]
. See #53.
Expected values for softwareRequirements are of type Text or URL, but not another SoftwareApplication. I guess we need to extend the target via ro-terms.
What is the script used to wrap up a software component?
We're mapping tool wrappers (e.g.,
foo.cwl
) to SoftwareApplication. Wrappers at lower levels can also beSoftwareApplication
, but we need to draw the line somewhere (related to container image).