getpopper / popper

Container-native task automation engine.
https://getpopper.io
MIT License
302 stars 61 forks source link

proposal: use CWL for portable tool and workflow definitions #73

Open mr-c opened 7 years ago

mr-c commented 7 years ago

Hello, I am Michael R. Crusoe, one of the co-founders of the Common Workflow Language project, its Community Engineer, and a former sysadmin/SRE.

I really like this project's ethos!

You may find a benefit from examining the standards from the Common Workflow Language project, they define an interface for running command line data analysis programs and the workflows made from them.

These standardized descriptions are portable and executable on a variety of workflow management systems that collectively support most any backend: local execution, various HPC & cloud interfaces. The use of a standard description decouples the execution from the description of the data analysis, calculation, or simulation.

I would also like to bring to your attention the http://researchobject.org/ methods and standards for representing an output of research (a figure, data tables, raw results, and others) along with attribution for all contributors, provenance for the data and code, and an abstract representation of the workflow used to create the output (regardless if CWL or a system specific approach was taken).

Cheers,

ivotron commented 7 years ago

Hi @mr-c, thanks a lot for reaching out.

We learned about CWL thanks to a reference from the Genomics folks here at UCSC (devs of Toil). I investigated a bit if we could express some of our workflows but at the time (~6 months ago) it didn't seem to be able to express loops and distributed tasks. I find that most of the experiments we do in distributed storage, analysis and data management systems are relatively simple and can be expressed using bash, Ansible and/or docker compose. These are simple benchmarking experiments which, when visualized in a DAG, would be something like 3 nodes without much complexity.

Having said that, the goal of Popper is to be tool-agnostic in order to abstract experiments, treat them as a black-box, and assume that they will get implemented using reproducibility-friendly tools/standards (such as CWL, researchobject, bash and Ansible). Since Popper is a methodology, one can use any tool as long as the researcher is comfortable creating scripts for it.

thanks!

mr-c commented 7 years ago

You are welcome @ivotron , glad to hear that you've already thought about this. Cheers!

mr-c commented 3 years ago

Dear @ivotron

In https://conferences.computer.org/scwpub/pdfs/CANOPIE-HPC2020-6GN8joymhwMWpK4pB3hqMl/306200a008/306200a008.pdf / 10.1109/CANOPIEHPC51917.2020.00007 it is written

Popper allows exporting a workflow to other workflow specification formats such as CWL

Can you point me to any code or documentation about that?

ivotron commented 3 years ago

hi @mr-c, thanks a lot for reaching out!

We haven't documented the API yet. The code is available here. This WorkflowExporter class allows to create plugins to export popper workflows to other formats like CWL, and we are planning to implement exporter for CI services (circle, gitlab, and github actions).

We would love to have one exporter for CWL! 😃