PyPSA / pypsa-eur

PyPSA-Eur: A Sector-Coupled Open Optimisation Model of the European Energy System
https://pypsa-eur.readthedocs.io/
346 stars 247 forks source link

Documentation/advice regarding choice of workflow system (snakemake vs. pachyderm) #535

Closed ToddG closed 3 years ago

ToddG commented 3 years ago

I am looking into distributed scientific computation platforms for a client.

I'm asking b/c it seems that pachyderm has several advantages:

Snakemake has several advantages on it's side, namely:

My personal preference would be to use snakemake because it is simpler. However, the input provenance in pachyderm is compelling as well...

BTW - have you had issues with sub-workflows in snakemake like these?

Thank you, Todd Greenwood

nworbmot commented 3 years ago

Hi Todd, we've mostly had a very positive experience with snakemake. It feels like we're "self-documenting" the workflow when using it. Our users find the learning curve steep, but worth it. The cluster integration is very good. Other projects in the energy space like calliope also use it.

Downsides: There can be Linux-Windows compatibility problems. The first problem that you list above with sub-workflows we can confirm.

We haven't compared with pachyderm, it's the first I've heard of it. We have sunk costs with snakemake now, and it fulfils all our needs, so we are unlikely to switch.

ToddG commented 3 years ago

@nworbmot Thank you for the response. I decided to move forward with Apache Airflow and PySpark, though I am fascinated with snakemake and may use it on a future project. I decided against pachyderm b/c it looks like a vendor-locked ecosystem, and the most compelling feature, provenance, seems to reinforce that vendor-lock.