Closed kubu4 closed 3 years ago
Snakemake has a --conda-create-envs-only
flag that is designed for this scenario:
Conda deployment also works well for offline or air-gapped environments. Running snakemake
--use-conda --conda-create-envs-only
will only install the required conda environments without running the full workflow. Subsequent runs with --use-conda will make use of the local environments without requiring internet access.
(from the snakemake docs)
So hopefully you will be able to run this on the head node to create the environments before submitting the pipeline.
The Docker image has all of the dependencies pre-installed (which makes it rather a large image) so shouldn't require any internet access.
Thanks! That got me past the conda environment installs!
However, now the job is dying because it's trying to download the local assembly from NCBI, lineages from BUSCO, and databases UniProt, despite this being a local assembly. I see in Issue #6 that person manually commented out all of the fetch commands in Snakefile_v2
. Is that the solution to this? Also, where might I find that file? I can't find it in the .snakmake/
directory (which is in my working directory).
where might I find that file?
Found it:
blobtoolkit/insdc-pipeline/Snakefile
Have commented out the various fetch rules. Fingers crossed...
This seems to eliminate errors related to not having internet access.
I'm attempting to run the pipeline on a computing cluster at the Univ. of Washington. The computing nodes that are controlled via SLURM do not have internet access. As such, when running the pipeline, I get the following error:
Is there a means by which to get around the need for internet access?
I see this seems to be related to setting up the BUSCO conda environment. Could I set up the environment manually and then run the pipeline? I suspect the pipeline will still attempt to connect to the internet, even if the environment has already been set up, but I'm not positive.
Are there other similar sticking points that will require internet access when running a local pipeline?
Finally, I suppose the real answer to all of this is to run this in a container (Docker/Singularity), but I'm also not entirely sure if those will also require internet access during the process.