Toil drops execute permissions on files it imports, leading to: OSError: permission denied (for CWL Directory with script inside)

ionox0 commented 4 years ago

Hi all, I was hoping to get some info related to CWL / toil support for Directory types with a script inside.

Specifically we have a (possibly overly-complicated) setup with a bash script located inside of a Directory, and that Directory is used as an input to the workflow.

One of the tools in the workflow is then supposed to call the script, however it seems the permissions of the script are altered to prevent execution:

# View permissions of symlink to script in toil tmpdir:
$ ll ./tmpfagsag/stg3d63e6a4-2d71-4c9f-9262-76ec1a6aa820/ACCESS_SV/scripts/iAnnotateSV.sh

lrwxrwxrwx 1 accessbot access 247 Aug 20 11:42 ./tmpfagsag/stg3d63e6a4-2d71-4c9f-9262-76ec1a6aa820/ACCESS_SV/scripts/iAnnotateSV.sh -> /project_XXX/tmp/toil-c0480c31-ca69-40d7-b559-c8b7b2412aca-14e14258-9b46-4ae7-b0e0-f48d45462195/tmpoQLWax/6df530f0-b28e-458a-9d10-999543a9383b/tmpiP7Mn8.tmp

# View permissions of script itself in toil tmpdir:
$ ll /project_XXX/tmp/toil-c0480c31-ca69-40d7-b559-c8b7b2412aca-14e14258-9b46-4ae7-b0e0-f48d45462195/tmpoQLWax/6df530f0-b28e-458a-9d10-999543a9383b/tmpiP7Mn8.tmp

-rw-r--r-- 1 accessbot access 3.2K Aug 20 11:42 /project_XXX/tmp/toil-c0480c31-ca69-40d7-b559-c8b7b2412aca-14e14258-9b46-4ae7-b0e0-f48d45462195/tmpoQLWax/6df530f0-b28e-458a-9d10-999543a9383b/tmpiP7Mn8.tmp

# Try to execute it
$ /project_XXX/tmp/toil-c0480c31-ca69-40d7-b559-c8b7b2412aca-14e14258-9b46-4ae7-b0e0-f48d45462195/tmpoQLWax/6df530f0-b28e-458a-9d10-999543a9383b/tmpiP7Mn8.tmp

zsh: permission denied: /project_XXX/tmp/toil-c0480c31-ca69-40d7-b559-c8b7b2412aca-14e14258-9b46-4ae7-b0e0-f48d45462195/tmpoQLWax/6df530f0-b28e-458a-9d10-999543a9383b/tmpiP7Mn8.tmp

This is how the error looks from the cwltoil logs:

INFO:toil.worker:---TOIL WORKER OUTPUT LOG---
INFO:toil:Running Toil version 3.21.1-afbb7957778d805d2149cdbe4dd2c9645d021ffa-dirty.
[job manta_annotation.cwl] /project_XXX/tmp/toil-c0480c31-ca69-40d7-b559-c8b7b2412aca-71cbafe4-6fa1-4a8c-ad76-2587c0a73697/tmpPlxvgy/d3709ab8-a49d-4da3-8fda-d62cbb56f095/tPOeaQP/tmp-outtN2kr9$ /project_XXX/_fPhWIk_14382/tmpcsif3s/stgf742f8b1-18db-4946-893d-f63d39c23bba/ACCESS_SV/scripts/iAnnotateSV.sh \
    /project_XXX/_fPhWIk_14382/tmpcsif3s/stgc66fd049-38cd-41dc-a3a7-1424d8b980a8/somaticSV.vcf.gz \
    C-1NPV4P-L011-d \
    . \
    /project_XXX/_fPhWIk_14382/tmpcsif3s/stgaef93c86-92c4-446c-9325-efe38e9a078a/1.5.0 \
    /project_XXX/_fPhWIk_14382/tmpcsif3s/stgfb90cf2b-fc27-40ea-9e1b-2af9b6425672/Homo_sapiens_assembly19.fasta \
    /home/accessbot/miniconda3/envs/ACCESS_1.3.36/bin/python2 \
    /opt/common/CentOS_6/java/jdk1.8.0_31/bin/java
INFO:cwltool:[job manta_annotation.cwl] /project_XXX//tmp/toil-c0480c31-ca69-40d7-b559-c8b7b2412aca-71cbafe4-6fa1-4a8c-ad76-2587c0a73697/tmpPlxvgy/d3709ab8-a49d-4da3-8fda-d62cbb56f095/tPOeaQP/tmp-outtN2kr9$ /project_XXX/_fPhWIk_14382/tmpcsif3s/stgf742f8b1-18db-4946-893d-f63d39c23bba/ACCESS_SV/scripts/iAnnotateSV.sh \
    /project_XXX/_fPhWIk_14382/tmpcsif3s/stgc66fd049-38cd-41dc-a3a7-1424d8b980a8/somaticSV.vcf.gz \
    C-1NPV4P-L011-d \
    . \
    /project_XXX/_fPhWIk_14382/tmpcsif3s/stgaef93c86-92c4-446c-9325-efe38e9a078a/1.5.0 \
    /project_XXX/_fPhWIk_14382/tmpcsif3s/stgfb90cf2b-fc27-40ea-9e1b-2af9b6425672/Homo_sapiens_assembly19.fasta \
    /home/accessbot/miniconda3/envs/ACCESS_1.3.36/bin/python2 \
    /opt/common/CentOS_6/java/jdk1.8.0_31/bin/java
Exception while running job
Traceback (most recent call last):
  File "/home/accessbot/miniconda3/envs/ACCESS_1.3.40/lib/python2.7/site-packages/cwltool/job.py", line 325, in _execute
    monitor_function=monitor_function
  File "/home/accessbot/miniconda3/envs/ACCESS_1.3.40/lib/python2.7/site-packages/cwltool/job.py", line 785, in _job_popen
    cwd=cwd)
  File "/home/accessbot/miniconda3/envs/ACCESS_1.3.40/lib/python2.7/site-packages/subprocess32.py", line 614, in __init__
    restore_signals, start_new_session)
  File "/home/accessbot/miniconda3/envs/ACCESS_1.3.40/lib/python2.7/site-packages/subprocess32.py", line 1393, in _execute_child
    raise child_exception_type(errno_num, err_msg)
OSError: [Errno 13] Permission denied
ERROR:cwltool:Exception while running job
Traceback (most recent call last):
  File "/home/accessbot/miniconda3/envs/ACCESS_1.3.40/lib/python2.7/site-packages/cwltool/job.py", line 325, in _execute
    monitor_function=monitor_function
  File "/home/accessbot/miniconda3/envs/ACCESS_1.3.40/lib/python2.7/site-packages/cwltool/job.py", line 785, in _job_popen
    cwd=cwd)
  File "/home/accessbot/miniconda3/envs/ACCESS_1.3.40/lib/python2.7/site-packages/subprocess32.py", line 614, in __init__
    restore_signals, start_new_session)
  File "/home/accessbot/miniconda3/envs/ACCESS_1.3.40/lib/python2.7/site-packages/subprocess32.py", line 1393, in _execute_child
    raise child_exception_type(errno_num, err_msg)
OSError: [Errno 13] Permission denied

I'm wondering if there is a way to set permissions of this script that is an input to the pipeline, or perhaps I should try with --linkImports which would link instead of copying the script.

Toil version: 3.21.1 (note we are using our own fork https://github.com/mskcc/toil which is based off of the 3.21.1 upstream release)

cwltool version: 1.0.20190906054215

┆Issue is synchronized with this Jira Task ┆Issue Number: TOIL-600

adamnovak commented 4 years ago

With a file job store, --linkImports might solve the problem.

In general Toil makes complete copies of files itself and doesn't preserve ownership or permissions. If you're writing a Toil script yourself in Python it's easy to chmod the file you just downloaded, but in CWL it's harder.

We probably should preserve permissions in all of our FileStores (and thus somehow the backing JobStores).

adamnovak commented 3 years ago

I think the way to approach this is, first, to specify that permissions need to be preserved in the docs for all the user-facing functions that put files into or take files out of the file store:

importFile and exportFile on the Toil class
readGlobalFile and writeGlobalFile on the AbstractFileStore class

The descriptions for the functions that upload files in should say that, if an executable file on the local filesystem is uploaded, its executability will be preserved when it is downloaded again. The functions that download files should say that the file that is downloaded will be executable if it was originally uploaded from an executable file on the local filesystem.

The next step would be to write some tests for this functionality (which won't pass yet):

For importFile and exportFile, you might want to add a test next to this one that makes a file with or without the executable permission set, imports and then immediately re-exports it to a different path, and checks to make sure that the executable flag on the new file is the right value.
For writeGlobalFile and readGlobalFile, you probably want to add a test next to this one that, again, writes and re-reads a file and makes sure that the executable flag is set correctly, but this time using the file store and writeGlobalFile and readGlobalFile.

To set the executable flag on a file, for the tests or the final implementation, you would first use the st_mode field of the result of os.stat() on the file to get the permissions. Then you would take the current permissions and bitwise OR in stat.S_IXUSR, which represents the file owner's permission to execute it, with Python's | operator. Finally, you would use os.chmod to apply the new permission bits. For an example (using the fchmod and fstat functions that work on open files instead of paths), see this StackOverflow answer. To check the executable flag on the file, you should be able to bitwise AND the current permissions bits with stat.S_IXUSR; a nonzero value will indicate that the executable flag is set.

(For more background on Unix file permission bits, see here. They're often written as 3-digit octal (base 8) numbers, and they're stored as bits in a number where each bit has a specific meaning. When we talk about "the executable flag", we really mean the file owner's execute permission bit.)

Having written tests, you can then think about the actual implementation. For storing whether the file is executable or not, I would recommend adding another field to the FileID type, next to where we store the size of the file, for storing the permissions that the file ought to have (i.e. whether it is executable or not). Since FileID provides a fromPath that automatically finds file sizes when a file on disk is being uploaded, that might be the right place to put the code for finding whether the executable flag is set as well. Than you'd have to go to where the Toil class, CachingFileStore, and NonCachingFileStore create their FileIDs for on-disk files (here and here for the file stores), possibly chase calls until you find the actual FileID() constructor calls, and make sure the executable bit is being read and stored. (For Toil.importFile() things might be a bit tricky, because if it isn't writing to a file job store it will fall back on just streaming the data to a new empty file in the job store. You may need to add some new arguments to some existing functions to pass the executable bit through, or just set it yourself after the destination gives you a FileID.)

Then on the output side, you'd need to add some code to the downloading functions to check and see if they got a FileID that says the file is supposed to be executable, and if so set the flag on the file. For the CachingFileStore, we will probably have to touch all there of the downloading functions called from here to make them set permissions; the atomic_copy function that they in tuen call ought to already preserve permissions when just copying things around the filesystem.

One thing to watch out for on the read side is that some workflows like Cactus will just pass strings instead of actual FileIDs; in those cases we can just say that the downloaded file won't be executable, because the workflow is breaking the rules.

DataBiosphere / toil

Toil drops execute permissions on files it imports, leading to: OSError: permission denied (for CWL Directory with script inside) #3146