ResearchObject / workflow-run-crate

Workflow Run RO-Crate profile
https://www.researchobject.org/workflow-run-crate/
Apache License 2.0
8 stars 9 forks source link

CQ10 - Tool wrappers #18

Closed simleo closed 7 months ago

simleo commented 2 years ago

What is the script used to wrap up a software component?

We're mapping tool wrappers (e.g., foo.cwl) to SoftwareApplication. Wrappers at lower levels can also be SoftwareApplication, but we need to draw the line somewhere (related to container image).

stain commented 1 year ago

Would using softwareRequirements coupled with mainEntity make sense?

{ "@id": "foo.cwl",
  "@type": "SoftwareApplication",
  "name": "CWL wrapper of Foo",
  "programmingLanguage": {"@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl"},
  "softwareRequirements": [
      {"@id": "http://example.com/foo"},
      {"@id": "http://python.org/"}
  ],
  "mainEntity": {"@id": "http://example.com/foo"}
},
{
  "@id": "http://example.com/foo",
  "@type": "SoftwareApplication",
  "name": "Foo application"
},
{
  "@id": "http://python.org/",
  "@type": "SoftwareApplication",
  "name": "Python language"
},
{ 
  "@id": "https://w3id.org/workflowhub/workflow-ro-crate#cwl",
  "@type": "ProgrammingLanguage",
  "name": "Common Workflow Language"
}
simleo commented 1 year ago

Would using softwareRequirements coupled with mainEntity make sense?

I like this. I guess that gathering the required metadata programmatically would be very hard in general, though. The tool might not even be the first token in the command line (e.g., time samtools stats foo.bam), and getting the dependencies involves a nontrivial search that requires knowing at least the exact version of the tool (this is true whatever representation we adopt). So this is something that the crate author would probably have to fill in "manually", unless it's supported somehow at the workflow language level.

kinow commented 1 year ago

Commented in the meeting minutes today, adding here for posterity too. StackStorm allow workflow devs to create Actions for commands, such as ping, docker, etc.

I think this, and maybe AirFlow operators, could serve as options to compare how tool wrappers are used in workflows.

-Bruno

simleo commented 1 year ago

As discussed at the 2023-02-16 meeting, in CWL, workflow devs can make things easier by specifying the SoftwareRequirement hint, which could be used to fill in softwareRequirements. The mainEntity, as discussed above, would be harder to determine programmatically, but it's still nice to have it in the model for those cases where it can be filled in.

Note that the "main dependency" would still not provide information on the wrapped executable. For instance, the equivalent of the CWL example linked above would be:

{
    "@id": "interproscan.cwl",
    "@type": "SoftwareApplication",
    "name": "CWL wrapper of InterProScan",
    "softwareRequirements": [
        {"@id": "https://identifiers.org/rrid/RRID:SCR_005829"}
    ],
    "mainEntity": {"@id": "https://identifiers.org/rrid/RRID:SCR_005829"}
},
{
    "@id": "https://identifiers.org/rrid/RRID:SCR_005829",
    "@type": "SoftwareApplication",
    "name": "InterProScan",
    "softwareVersion": "5.21-60"
}

Where there's no equivalent of baseCommand: interproscan.sh. Also note that some packages install multiple executables.

simleo commented 1 year ago

In Workflow Run Crates, to use softwareRequirements on the workflow, the SoftwareApplication type needs to be added to ["File", "SoftwareSourceCode", "ComputationalWorkflow"]. See #53.

simleo commented 7 months ago

Expected values for softwareRequirements are of type Text or URL, but not another SoftwareApplication. I guess we need to extend the target via ro-terms.