Reed-CompBio / spras

Signaling Pathway Reconstruction Analysis Streamliner (SPRAS)
MIT License
11 stars 20 forks source link

Support Snakemake workflows on HTCondor with file transfer #91

Open agitter opened 1 year ago

agitter commented 1 year ago

I discussed barriers to running SPRAS in a high-throughput computing environment with @bbockelm and @jhiemstrawisc. One challenge is that each algorithm's run function executes some Python code to set up a Docker run call (or the Singularity equivalent) and then makes that Docker run call. It would be better to have an isolated, modular task to execute entirely inside the container environment.

If we are going to prototype this in the UW-Madison Center for High Throughput Computing, we will still need pre- and post-processing for the Docker run call. That would include moving files because all input files end up in one directory and all output files end up in one directory. The pre- and post-processing could take place before and after the modular run call instead of the current implementation that places them all together.

@jhiemstrawisc may help me exploring making this part of the code more modular.

A separate topic (which likely deserves its own issue later) is to look into the htcondor Snakemake profile.

agitter commented 1 year ago

@jhiemstrawisc and I discussed this further and looked at Omics Integrator 1 as an example. As a next step, he will test running our Omics Integrator 1 Docker image in CHTC independent of SPRAS to confirm that works. Then, we'll plan how to prepare the Docker run call in a compatible way.

The motivation for our current approach to setting up the docker run calls in this complicated way that maps many individual input and output files is described here https://github.com/Reed-CompBio/spras/pull/49#issuecomment-1051192702.

agitter commented 1 year ago

I renamed the issue from "Modularize container run calls within algorithm run step" to "Support Snakemake workflows on HTCondor with file transfer" because container run calls are only one aspect of what @jhiemstrawisc is working on. He has made progress with generic Snakemake workflows and an HTCondor Snakemake executor https://github.com/snakemake/snakemake/issues/2405.

A current challenge is that running an individual rule remotely on an HTCondor execute point fails when the Snakefile tries to import runner. We believe having an installable Python package would solve this #86.