Closed ToddG closed 3 years ago
Hi Todd, we've mostly had a very positive experience with snakemake. It feels like we're "self-documenting" the workflow when using it. Our users find the learning curve steep, but worth it. The cluster integration is very good. Other projects in the energy space like calliope also use it.
Downsides: There can be Linux-Windows compatibility problems. The first problem that you list above with sub-workflows we can confirm.
We haven't compared with pachyderm, it's the first I've heard of it. We have sunk costs with snakemake now, and it fulfils all our needs, so we are unlikely to switch.
@nworbmot Thank you for the response. I decided to move forward with Apache Airflow and PySpark, though I am fascinated with snakemake
and may use it on a future project. I decided against pachyderm
b/c it looks like a vendor-locked ecosystem, and the most compelling feature, provenance, seems to reinforce that vendor-lock.
I am looking into distributed scientific computation platforms for a client.
snakemake
?snakemake
...I'm asking b/c it seems that
pachyderm
has several advantages:Snakemake
has several advantages on it's side, namely:My personal preference would be to use
snakemake
because it is simpler. However, the input provenance inpachyderm
is compelling as well...BTW - have you had issues with sub-workflows in
snakemake
like these?Thank you, Todd Greenwood