Open bkuczenski opened 5 years ago
what do you think about this checklist? Im not sure it fits exactly here but its all important good practices.
and snakemake ? I came accross it in a workshop on reproducibility of energy system models, but I have not used it yet.
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.
I think the checklist looks great. Those are all good housekeeping practices, but attaining them all would take more than a week, I think. It seems like the role our group can play is in advising the others on which goals are worth spending time on and which can be pushed to later.
It's also worth considering that "reproducibility" in the scientific sense is different from "cross-platform scalability" in the high-performance computing sense. We shouldn't let the latter become a reductive definition of the former.
Do you want to take a crack at filtering that list to the things that are likely most important to the hackathon deliverables?
Proposition: should one of our tasks or ongoing roles in this working group be to audit the source management of the other groups? alternatively, since we have commit access to the repos with which we're involved, should it simply be our responsibility to enact good source management?
Before the start of the hackathon, we need to promulgate a set of expectations and guidelines for how the other working groups should plan to make their work reproducible. This will include:
What do we tell them?