PMCC-BioinformaticsCore / janis-core

Core python modules for Janis Pipeline workflow assistant
GNU General Public License v3.0
4 stars 9 forks source link

convert to WDL: force co-localization of secondary_files #90

Open mr-c opened 3 years ago

mr-c commented 3 years ago

See https://github.com/biowdl/tasks/pull/291 for a demonstration for both individual files and arrays of files

illusional commented 3 years ago

Thanks @mr-c, Janis should already do this on translation to WDL: https://github.com/PMCC-BioinformaticsCore/janis-core/blob/0b5b79fee09d3fcd6bf78bd7c883474eef1f3e9c/janis_core/translations/wdl.py#L2079

(Because I saw the same problems haha)

mr-c commented 3 years ago

@illusional Huh, I'm not seeing that behaviour for either a single File input with a secondary file specifier, nor an array of Files with a secondary file specifier:

secondary_files.cwl


cwlVersion: v1.2
class: CommandLineTool

arguments:

from janis_core import * from janis_core.types.common_data_types import GenericFileWithSecondaries, File

Secondary_Files_V0_1_0 = CommandToolBuilder(tool="secondary_files", base_command=None, inputs=[ToolInput(tag="inp", input_type=GenericFileWithSecondaries(secondaries=["^.tar"]), doc=InputDocumentation(doc=None))], outputs=[ToolOutput(tag="outp", output_type=File(), selector=WildcardSelector(wildcard="lsout"), doc=OutputDocumentation(doc=None))], container="ubuntu:latest", version="v0.1.0", arguments=[ToolArgument(value="ls | grep -v lsout", position=None, doc=InputDocumentation(doc=None), shell_quote=False)])

if name == "main":

or "cwl"

Secondary_Files_V0_1_0().translate("wdl")

$ janis translate secondary_files_v0_1_0.py wdl -o . 2021-08-29T11:13:34 [INFO]: The command tool ({'tool': CommandTool, 'toolid': 'secondary_files'}).outp' used a star-bind (*) glob to find the output, but the return type was not an array. For WDL, the first element will be used, ie: 'glob("lsout")[0]'

> secondary_files_v0_1_0.wdl
``` wdl
version development

task secondary_files {
  input {
    Int? runtime_cpu
    Int? runtime_memory
    Int? runtime_seconds
    Int? runtime_disks
    File inp
    File inp_tar
  }
  command <<<
    set -e
     \
      ls | grep -v lsout
  >>>
  runtime {
    cpu: select_first([runtime_cpu, 1])
    disks: "local-disk ~{select_first([runtime_disks, 20])} SSD"
    docker: "ubuntu@sha256:1e48201ccc2ab83afc435394b3bf70af0fa0055215c1e26a5da9b50a1ae367c9"
    duration: select_first([runtime_seconds, 86400])
    memory: "~{select_first([runtime_memory, 4])}G"
    preemptible: 2
  }
  output {
    File outp = glob("lsout")[0]
  }

array_secondary_files.cwl


cwlVersion: v1.2
class: CommandLineTool

arguments:

task array_secondary_files { input { Int? runtime_cpu Int? runtime_memory Int? runtime_seconds Int? runtime_disks Array[File] input_list Array[File] input_list_tar } command <<< set -e \ ls | grep -v lsout

runtime { cpu: select_first([runtime_cpu, 1]) disks: "local-disk ~{select_first([runtime_disks, 20])} SSD" docker: "ubuntu@sha256:1e48201ccc2ab83afc435394b3bf70af0fa0055215c1e26a5da9b50a1ae367c9" duration: select_first([runtime_seconds, 86400]) memory: "~{select_first([runtime_memory, 4])}G" preemptible: 2 } output { File outp = glob("lsout")[0] } }