common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
335 stars 230 forks source link

Symlinking input to docker on Mac links in local, not the host VM #55

Open prismofeverything opened 8 years ago

prismofeverything commented 8 years ago

I am trying to run cwltool on Mac that invokes a Docker container. When I specify the input to cwltool, in order for the container to access the input file cwltool is trying to create a symlink to the input file inside the host VM. But since I am running cwltool locally it creates the link locally instead, which causes the container to fail because it can't locate the input file inside the host.

Here is the command line I am using:

cwl-runner --debug samtools-workflow.cwl --input test/input/original.bam

and I end up with this symlink in my local directory:

indexed.bam -> test/input/original.bam

I end up with these lines in the debug output:

[job samindex] /var/folders/2l/0wpdpqws4jvg9lqjwrhvdcl8_c3ksp/T/tmpTvbsTy$ docker run -i --volume=/Users/spanglry/Code/pipelines-api-examples/samtools/test/input/original.bam:/var/lib/cwl/job956205074_input/original.bam:ro --volume=/var/folders/2l/0wpdpqws4jvg9lqjwrhvdcl8_c3ksp/T/tmpTvbsTy:/var/spool/cwl:rw --volume=/var/folders/2l/0wpdpqws4jvg9lqjwrhvdcl8_c3ksp/T/tmpYpwv0a:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1000 --rm --env=TMPDIR=/tmp sha256:f8de369e9dddf875c5f53c5aada66596d12affccf8b96da15a00c48a1b3a4be9 samtools index indexed.bam
symlinking /var/folders/2l/0wpdpqws4jvg9lqjwrhvdcl8_c3ksp/T/tmpTvbsTy/indexed.bam to /var/lib/cwl/job956205074_input/original.bam
open: No such file or directory
[bam_index_build2] fail to open the BAM file.

Contents of samtools-workflow.cwl -----------------

#!/usr/bin/env cwl-runner

class: Workflow

inputs:
  - id: "#input"
    type: File
    description: "bam file"

outputs:
  - id: "#bam"
    type: File
    source: "#samindex.bam_with_bai"

hints:
  - class: DockerRequirement
    dockerLoad: gcr.io/level-elevator-714/samtools
    dockerImageId: sha256:f8de369e9dddf875c5f53c5aada66596d12affccf8b96da15a00c48a1b3a4be9

steps:
  - id: "#samindex"
    run: { import: samtools-index.cwl }
    inputs:
      - { id: "#samindex.input", source: "#input" }
    outputs:
      - { id: "#samindex.bam_with_bai" }

Contents of samtools-index.cwl -------------------

#!/usr/bin/env cwl-runner

class: CommandLineTool

description: "Invoke 'samtools index' to create a 'BAI' index (samtools 1.19)"

requirements:
  - class: CreateFileRequirement
    fileDef:
      - filename: indexed.bam
        fileContent:
          engine: "cwl:JsonPointer"
          script: "job/input"

inputs:
  - id: "#input"
    type: File
    description:
      Input bam file.

outputs:
  - id: "#bam_with_bai"
    type: File
    outputBinding:
      glob: "indexed.bam"
      secondaryFiles:
        - ".bai"

baseCommand: ["samtools", "index"]

arguments:
  - "indexed.bam"
  - "indexed.bam.bai"

Questions:

Is there some way to indicate I want to be symlinking the input inside the docker host instead of the process running the script?

Am I doing something else wrong here?

Any help would be greatly appreciated, thank you!

mr-c commented 8 years ago

Hello @prismofeverything, thank you for reporting this issue. I don't have a Mac testbed, but many others do. Let me know if you don't a response within a week.

prismofeverything commented 8 years ago

Hello @mr-c, it has been a week : )

portah commented 8 years ago

@prismofeverything There are few things to pay attention to. There are two things that are specific for mac. First one you run docker inside (boot2docker) VM and the second the only read write path available inside docker's VM is in the /Users/ directory. So this error symlinking /var/folders/2l/0wpdpqws4jvg9lqjwrhvdcl8_c3ksp/T/tmpTvbsTy/indexed.bam to /var/lib/cwl/job956205074_input/original.bam I think related to that issue. @tetron has put all that notes for mac users somewhere on github about --tmpdir-prefix & --tmp-outdir-prefix. So try to use this tmp parameters to cwltool and point them into /Users directory.

The other question is it necessary to have --input test/input/original.bam the bum file somewhere but not in the ./ directory?

prismofeverything commented 8 years ago

@portah Aha, yes that was it. Though it still leaves the symlink and some temp dirs and files laying around the filesystem... at least it works now! Thank you.

├── indexed.bam -> /Users/poe/Code/samtools/test/input/original.bam
├── test3tuNdO
└── testGrHJZ9

Is the intention for cwltool to create these files, symlinks and directories inside the docker container? Seems weird they are left hanging around... maybe an artifact of the nested VM's?