Open ainar opened 1 year ago
Hi @ainar, good question. It was mainly a development to have all the functionality we wanted, but if there is another framework that ticks all the boxes and has already a good community around, it would indeed be worth a thought to port our pipelines to another framework and rather contribute to it.
Back then I tested quite some frameworks (and for me snakemake seems to be the most known one with the largest community), but somehow they all didn't exactly what we wanted. Some points that made us create our own pipeline tool is indeed, to keep it simple. Although now we are step by step thinking that it would be nice to have more complex functionality such as multi-machine scheduling etc.
Some (maybe unique) functionality of synpp:
population_output_sr0.5_rs1234.csv
which is not very flexible. I don't know if the other systems you have cited are better in passing down configuration parameters along the stage hierarchy. Basically, each stage in synpp
is not only the stage itself, but also its parametrizationIf we find something that can do all of this with an existing community behind, that would be quite nice :)
I think other options are, for instance, Celery or non-python-based Airflow.
I discovered synpp through https://github.com/eqasim-org/ile-de-france. This tool is handy. Doing some research, I found that this kind of framework is widespread. They are called "data pipeline" frameworks. We can find it in bioinformatics or, more generally, in data science research works.
So here are my questions:
For example:
I think the reason for synpp is that it is more straightforward than the other tools I listed (or I am just used to it). Because of that, I think synpp should stay the most simple, not reinvent the wheel with each new feature. Do you have another opinion? What were your thoughts when you thought about alternatives if you did?
That makes me wonder, can synpp be officially generalized for other works non-related to population synthesis?