Closed aryarm closed 5 years ago
Before we do this, we should make sure to split the pipeline up into smaller steps, since snakemake likes to have control over every small step in the pipeline so that it can manage job control. This will probably require breaking up each of the python scripts.
Some of the python code simply calls terminal commands. So we might consider converting some of the python code to bash scripts, which will be easier to terminate and control from snakemake.
I think I created a snakemake pipeline for this with commit 085775ef6a470b7bc4b694ca541eaf9013916780. However, Snakemake requires using python3, so this is on hold until I can get the code onto a newer linux server.
Nice! It looks good. You'll probably need a config file eventually, so you can fill the wildcards in the Snakefile (otherwise, snakemake
won't know which wildcards to use when it calls your rules). I usually use the config file to specify the paths to inputs to the pipeline. Additionally, config variables can be overridden from the command line, so it's super to easy to switch out inputs (or use a subset of them if you write some code in your Snakefile to do it).
Here's an example Snakemake pipeline I've worked with, if that helps.
One nice thing about snakemake
is that you can run separate parts of the pipeline in different conda
environments (ie you can call the entire pipeline from a python3
environment but have it execute its rules in a python2
environment). I started trying to create an environment file in f73028b. Also see issue #6.
The steps which are completed have been written into a Snakefile and the pipeline is working.
Future steps to do:
The last two steps have been completed, so the only thing left to do with the pipeline is to extract the command-line options and other things which I may be tweaked into a separate configuration file, instead of the current setup, where one is a global constant defined in the pipeline and the others are forced to be the default values.
I moved that last step into a new issue because it's a separate thing from making a pipeline.
The issue is here: https://github.com/beelabhmc/ant_tracker/issues/20
once we have a working pipeline, it might be nice to convert it to a snakemake pipeline so that it can be run in parallel on a cluster machine