common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
332 stars 229 forks source link

error if symlinks with absolute paths are present inside Directory outputs #1461

Open mr-c opened 3 years ago

mr-c commented 3 years ago
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
inputs: []
baseCommand: [ bash, -c ]
arguments:
 - "mkdir foo; echo 42 > foo/bar; ln -s  $PWD/foo/bar foo/baz"
outputs:
  result:
    type: Directory
    outputBinding:
      glob: foo

Short term workaround: use --copy-outputs but that leaves intermediate files laying around afterwards.

Example error:

$ TMPDIR=$PWD cwltool   --debug tests/symlinks.cwl
INFO /home/michael/ebi/env/bin/cwltool 3.1.20210623153106
INFO Resolved 'tests/symlinks.cwl' to 'file:///home/michael/cwltool/tests/symlinks.cwl'
DEBUG Parsed job order from command line: {
    "id": "tests/symlinks.cwl"
}
DEBUG [job symlinks.cwl] initializing from file:///home/michael/cwltool/tests/symlinks.cwl
DEBUG [job symlinks.cwl] {}
DEBUG [job symlinks.cwl] path mappings is {}
DEBUG [job symlinks.cwl] command line bindings is [
    {
        "position": [
            -1000000,
            0
        ],
        "datum": "bash"
    },
    {
        "position": [
            -1000000,
            1
        ],
        "datum": "-c"
    },
    {
        "position": [
            0,
            0
        ],
        "datum": "mkdir foo; echo 42 > foo/bar; ln -s $PWD/foo/bar foo/baz"
    }
]
DEBUG [job symlinks.cwl] initial work dir {}
INFO [job symlinks.cwl] /home/michael/cwltool/l6s28ubx$ bash \
    -c \
    'mkdir foo; echo 42 > foo/bar; ln -s $PWD/foo/bar foo/baz'
DEBUG Could not collect memory usage, job ended before monitoring began.
INFO [job symlinks.cwl] completed success
DEBUG [job symlinks.cwl] outputs {
    "result": {
        "location": "file:///home/michael/cwltool/l6s28ubx/foo",
        "basename": "foo",
        "nameroot": "foo",
        "nameext": "",
        "class": "Directory"
    }
}
DEBUG [job symlinks.cwl] Removing input staging directory /home/michael/cwltool/tp94m_vi
DEBUG [job symlinks.cwl] Removing temporary directory /home/michael/cwltool/2zomy12a
DEBUG Moving /home/michael/cwltool/l6s28ubx/foo to /home/michael/cwltool/foo
DEBUG Moving /home/michael/cwltool/l6s28ubx/foo/bar to /home/michael/cwltool/foo/bar
DEBUG Moving /home/michael/cwltool/l6s28ubx/foo/bar to /home/michael/cwltool/foo/baz
ERROR Unhandled error:
  [Errno 2] No such file or directory: '/home/michael/cwltool/l6s28ubx/foo/bar'
Traceback (most recent call last):
  File "/usr/lib/python3.9/shutil.py", line 806, in move
    os.rename(src, real_dst)
FileNotFoundError: [Errno 2] No such file or directory: '/home/michael/cwltool/l6s28ubx/foo/bar' -> '/home/michael/cwltool/foo/baz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/michael/cwltool/cwltool/main.py", line 1248, in main
    (out, status) = real_executor(
  File "/home/michael/cwltool/cwltool/executors.py", line 59, in __call__
    return self.execute(process, job_order_object, runtime_context, logger)
  File "/home/michael/cwltool/cwltool/executors.py", line 155, in execute
    self.final_output[0] = relocateOutputs(
  File "/home/michael/cwltool/cwltool/process.py", line 400, in relocateOutputs
    stage_files(pm, stage_func=_relocate, symlink=False, fix_conflicts=True)
  File "/home/michael/cwltool/cwltool/process.py", line 296, in stage_files
    stage_func(entry.resolved, entry.target)
  File "/home/michael/cwltool/cwltool/process.py", line 371, in _relocate
    _relocate(dir_entry.path, fs_access.join(dst, dir_entry.name))
  File "/home/michael/cwltool/cwltool/process.py", line 373, in _relocate
    shutil.move(src, dst)
  File "/usr/lib/python3.9/shutil.py", line 820, in move
    copy_function(src, real_dst)
  File "/usr/lib/python3.9/shutil.py", line 435, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.9/shutil.py", line 264, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/home/michael/cwltool/l6s28ubx/foo/bar'
mr-c commented 3 years ago

The fix and/or its test seems flaky, so I reverted it in https://github.com/common-workflow-language/cwltool/pull/1483

ElderMedic commented 7 months ago

Hi, any chance this can be fixed in the future? We have a CommandLineTool to move all intermediary files/folders to the destination path using bash script at Runtime:

requirements:
  InlineJavascriptRequirement: {}
  ShellCommandRequirement: {}

  InitialWorkDirRequirement:
    listing:
        - entryname: mv.sh
          entry: |-
            shift;
            mkdir $(inputs.destination); 
            # Move each file individually, ignoring non-existing files
            for file in $@; do
              if [ -e "$file" ]; then
                mv -n "$file" "$(inputs.destination)"
              fi
            ls $(inputs.destination)
            done
inputs:
  files:
    type: File[]?
    inputBinding:
      position: 2
  folders:
    type: Directory[]?
    inputBinding:
      position: 3
  destination:
    type: string
    inputBinding:
      position: 1

baseCommand: [bash, -x, mv.sh]

outputs:
  results:
    type: Directory
    outputBinding:
      glob: $(inputs.destination)

The above module does not work most of the time, and the output directory has only broken symlinks pointing to the file in tmpdir instead of real generated files, and these tmpdir does not exist. Sometimes the cwltool runs finally with a success, and from the log the mv process looks fine without error, but still in the outputs folder (destination) we only have broken symlinks. Typical error such as:

ERROR Unhandled error, try again with --debug for more information:
  [Errno 2] No such file or directory: '/data/home/username/tmp/531ybmnk/aaaaaaa_illuminaQC_illumina_filtered_bbduk-summary.txt'

FYI, It is a remake of an older cwl script for the same purpose, we did it to minimize use of JS Expression as suggested in workflow description, and the old one use Javascript Expression:

requirements:
 - class: InlineJavascriptRequirement

inputs:
  files:
    type: File[]?
  folders:
    type: Directory[]?
  destination:
    type: string

expression: |
  ${
    var array = []
    if (inputs.files != null) {
      array = array.concat(inputs.files)
    }
    if (inputs.folders != null) {
      array = array.concat(inputs.folders)
    }
    var r = {
       'results':
         { "class": "Directory",
           "basename": inputs.destination,
           "listing": array
         } 
       };
     return r; 
   }

outputs:
  results:
    type: Directory

Thanks!