Closed ionox0 closed 3 years ago
With a file job store, --linkImports
might solve the problem.
In general Toil makes complete copies of files itself and doesn't preserve ownership or permissions. If you're writing a Toil script yourself in Python it's easy to chmod the file you just downloaded, but in CWL it's harder.
We probably should preserve permissions in all of our FileStores (and thus somehow the backing JobStores).
I think the way to approach this is, first, to specify that permissions need to be preserved in the docs for all the user-facing functions that put files into or take files out of the file store:
importFile
and exportFile
on the Toil
classreadGlobalFile
and writeGlobalFile
on the AbstractFileStore
classThe descriptions for the functions that upload files in should say that, if an executable file on the local filesystem is uploaded, its executability will be preserved when it is downloaded again. The functions that download files should say that the file that is downloaded will be executable if it was originally uploaded from an executable file on the local filesystem.
The next step would be to write some tests for this functionality (which won't pass yet):
importFile
and exportFile
, you might want to add a test next to this one that makes a file with or without the executable permission set, imports and then immediately re-exports it to a different path, and checks to make sure that the executable flag on the new file is the right value.writeGlobalFile
and readGlobalFile
, you probably want to add a test next to this one that, again, writes and re-reads a file and makes sure that the executable flag is set correctly, but this time using the file store and writeGlobalFile
and readGlobalFile
.To set the executable flag on a file, for the tests or the final implementation, you would first use the st_mode
field of the result of os.stat()
on the file to get the permissions. Then you would take the current permissions and bitwise OR in stat.S_IXUSR
, which represents the file owner's permission to execute it, with Python's |
operator. Finally, you would use os.chmod
to apply the new permission bits. For an example (using the fchmod
and fstat
functions that work on open files instead of paths), see this StackOverflow answer. To check the executable flag on the file, you should be able to bitwise AND the current permissions bits with stat.S_IXUSR
; a nonzero value will indicate that the executable flag is set.
(For more background on Unix file permission bits, see here. They're often written as 3-digit octal (base 8) numbers, and they're stored as bits in a number where each bit has a specific meaning. When we talk about "the executable flag", we really mean the file owner's execute permission bit.)
Having written tests, you can then think about the actual implementation. For storing whether the file is executable or not, I would recommend adding another field to the FileID
type, next to where we store the size of the file, for storing the permissions that the file ought to have (i.e. whether it is executable or not). Since FileID
provides a fromPath
that automatically finds file sizes when a file on disk is being uploaded, that might be the right place to put the code for finding whether the executable flag is set as well. Than you'd have to go to where the Toil
class, CachingFileStore
, and NonCachingFileStore
create their FileID
s for on-disk files (here and here for the file stores), possibly chase calls until you find the actual FileID()
constructor calls, and make sure the executable bit is being read and stored. (For Toil.importFile()
things might be a bit tricky, because if it isn't writing to a file job store it will fall back on just streaming the data to a new empty file in the job store. You may need to add some new arguments to some existing functions to pass the executable bit through, or just set it yourself after the destination gives you a FileID
.)
Then on the output side, you'd need to add some code to the downloading functions to check and see if they got a FileID
that says the file is supposed to be executable, and if so set the flag on the file. For the CachingFileStore
, we will probably have to touch all there of the downloading functions called from here to make them set permissions; the atomic_copy
function that they in tuen call ought to already preserve permissions when just copying things around the filesystem.
One thing to watch out for on the read side is that some workflows like Cactus will just pass strings instead of actual FileID
s; in those cases we can just say that the downloaded file won't be executable, because the workflow is breaking the rules.
Hi all, I was hoping to get some info related to CWL / toil support for Directory types with a script inside.
Specifically we have a (possibly overly-complicated) setup with a bash script located inside of a Directory, and that Directory is used as an input to the workflow.
One of the tools in the workflow is then supposed to call the script, however it seems the permissions of the script are altered to prevent execution:
This is how the error looks from the cwltoil logs:
I'm wondering if there is a way to set permissions of this script that is an input to the pipeline, or perhaps I should try with
--linkImports
which would link instead of copying the script.Toil version:
3.21.1
(note we are using our own fork https://github.com/mskcc/toil which is based off of the 3.21.1 upstream release)cwltool version:
1.0.20190906054215
┆Issue is synchronized with this Jira Task ┆Issue Number: TOIL-600