Open agitter opened 1 year ago
@jhiemstrawisc and I discussed this further and looked at Omics Integrator 1 as an example. As a next step, he will test running our Omics Integrator 1 Docker image in CHTC independent of SPRAS to confirm that works. Then, we'll plan how to prepare the Docker run call in a compatible way.
The motivation for our current approach to setting up the docker run calls in this complicated way that maps many individual input and output files is described here https://github.com/Reed-CompBio/spras/pull/49#issuecomment-1051192702.
I renamed the issue from "Modularize container run calls within algorithm run step" to "Support Snakemake workflows on HTCondor with file transfer" because container run calls are only one aspect of what @jhiemstrawisc is working on. He has made progress with generic Snakemake workflows and an HTCondor Snakemake executor https://github.com/snakemake/snakemake/issues/2405.
A current challenge is that running an individual rule remotely on an HTCondor execute point fails when the Snakefile tries to import runner
. We believe having an installable Python package would solve this #86.
I discussed barriers to running SPRAS in a high-throughput computing environment with @bbockelm and @jhiemstrawisc. One challenge is that each algorithm's run function executes some Python code to set up a Docker run call (or the Singularity equivalent) and then makes that Docker run call. It would be better to have an isolated, modular task to execute entirely inside the container environment.
If we are going to prototype this in the UW-Madison Center for High Throughput Computing, we will still need pre- and post-processing for the Docker run call. That would include moving files because all input files end up in one directory and all output files end up in one directory. The pre- and post-processing could take place before and after the modular run call instead of the current implementation that places them all together.
@jhiemstrawisc may help me exploring making this part of the code more modular.
A separate topic (which likely deserves its own issue later) is to look into the htcondor Snakemake profile.