Open DailyDreaming opened 3 years ago
Some way of invoking singularity with an arbitrary number of binds
I think the SINGULARITY_BIND
option recommended by @matmanc should work. Here's a first attempt at that: https://github.com/common-workflow-language/cwltool/pull/1386
@DailyDreaming it was pointed out by @tetron that my approach won't work, it will run into the same E2BIG
: https://github.com/common-workflow-language/cwltool/pull/1386#issuecomment-739333597
A better solution overall would be to create a hardlink tree so that only the base directory needs to be mounted into the container.
@mr-c @tetron I'm testing on a scaled down version of mantmanc's example test https://cwl.discourse.group/t/too-many-arguments-on-the-command-line/248/20 (changing for i in {1..2024}
-> for i in {1..20}
in create_file.cwl
).
cwltool
seems to run this successfully in about a minute, while toil
takes 24 minutes to fail.
Toil adding batching by directory may help, and makes sense to me. I brought this up with @adamnovak , and he said that they already batch directories for toil in vg
and sent the following link: https://github.com/vgteam/toil-vg/blob/295ea704cf64e8673a21a04fcf063ce0ee08d29f/src/toil_vg/iostore.py#L79
I think we should try to skip the compression, but I think implementing directory batching may help solve both:
I think on both the toil
and cwltool
side, we should attempt to un-restrict the 8mb heap limit when allowed to do so and thus the 1/4 limit size (2mb) to CLI commands, especially since @tetron mentioned his research lead him to believe that the env vars share the same memory space: https://github.com/common-workflow-language/cwltool/pull/1386#issuecomment-739333597 .
Seems like tarring it up and then untarring it later amounts to the same amount of I/O as copying all the files out of file store to reconstruct the directory. I don't think using a tar file is a good general solution. Copying into a temporary directory tree is easy, it use hard links or symlinks if we want to get a bit more clever.
On Tue, Dec 8, 2020, at 7:15 PM, Lon Blauvelt wrote:
@mr-c https://github.com/mr-c @tetron https://github.com/tetron I'm testing on a scaled down version of mantmanc's example test https://cwl.discourse.group/t/too-many-arguments-on-the-command-line/248/20 (changing
for i in {1..2024}
->for i in {1..20}
increate_file.cwl
).
cwltool
seems to run this successfully in about a minute, whiletoil
takes 24 minutes to fail.Toil adding batching by directory may help, and makes sense to me. I brought this up with @adamnovak https://github.com/adamnovak , and he said that they already batch directories for toil in
vg
and sent the following link: https://github.com/vgteam/toil-vg/blob/295ea704cf64e8673a21a04fcf063ce0ee08d29f/src/toil_vg/iostore.py#L79I think we should try to skip the compression, but I think implementing directory batching may help solve both:
- Speeding up toil's import of overly populous directories.
- Submitting a less verbose arg set of bind mounts. I think on both the
toil
andcwltool
side, we should attempt to un-restrict the 8mb heap limit when allowed to do so and thus the 1/4 limit size (2mb) to CLI commands, especially since @tetron https://github.com/tetron mentioned his research lead him to believe that the env vars share the same memory space: common-workflow-language/cwltool#1386 (comment) https://github.com/common-workflow-language/cwltool/pull/1386#issuecomment-739333597 .— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataBiosphere/toil/issues/3358#issuecomment-741293001, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKBOBBAQF5K6FAEJK44LH3ST26TJANCNFSM4UN7RT2Q.
@tetron Hmmm... I'll defer to your intuition, and I agree that if it were simply writing the tar vs. writing the recursive directory, it would be roughly the same io. I was just thinking that in Toil/python, we size each file and convert to a FileID object, before writing each file to the jobstore individually and I was hoping that dropping that overhead (especially sizing the individual files) might improve things.
If I understand correctly, the temporary directory tree would only need to be implemented on the cwltool
side? I assume this would look something like (very roughly):
associations = dict()
associated_tmp_dirs = dict()
for src, dst, read_write in locations:
# make a unique tmpdir for each basedir being mounted
if os.path.basedir(src) not in associated_tmp_dirs:
temp_dir = mktempdir()
associated_tmp_dirs[os.path.basedir(src)] = temp_dir
else:
temp_dir = associated_tmp_dirs[os.path.basedir(src)]
associations[src] = dict('src_dir': temp_dir,
'dst': dst)
if f'{os.path.basedir(src)}:{temp_dir}:{read_write}' not in already_existing_bind_mount_args:
add_bind_mount(f'{os.path.basedir(src)}:{temp_dir}:{read_write}')
run_hard_link_from_tmp_dir_to_real_locations_inside_of_container(associations)
I'll try to open a PR to this effect.
@tetron Will try to push the PR sometime tomorrow. Right now I'm attempting to group files with a common basedir together, create a tempdir, hardlink the files into the tempdir, and then bind mount a minimal set of tempdirs (and original dirs if they weren't files) to the file directories that Singularity originally wanted to find the files in.
We have hit this issue as well, and I was wondering if it might be easier to bind the whole jobstore and workdir folders, instead of binding each file inside of them individually.
Discussion and original issue is here: https://cwl.discourse.group/t/too-many-arguments-on-the-command-line/248/2
Short summary: If I use as input to a step and I put a directory array (Directory[]) and I run the workflow with singularity it happens that if the file list is too long I get a Too many arguments on the command line. I am currently running the workflow with Toil.
┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-738