ComputationalRadiationPhysics / picongpu

Performance-Portable Particle-in-Cell Simulations for the Exascale Era :sparkles:
https://picongpu.readthedocs.io
Other
704 stars 217 forks source link

Conventions around `snakemake` #5103

Open chillenzer opened 2 months ago

chillenzer commented 2 months ago

In closing this PR, I had a look at the Snakemake workflows here and there are a few things I stumbled across:

  1. There is a best practices page in the Snakemake docs suggesting amongst other things to use

  2. Snakemake provides its own environment management/deployment as well as batch system abstraction (as a plugin). This is functionality that significantly overlaps with tbg and both of these are delicate issues in the fragile and diverse HPC ecosystems we run in.

In the current stage, our usage does not conform to any of these standards. If our strategic decisions imply to professionalise the use of Snakemake, I think there would be value in moving more into that direction. However, my personal experience is that these things are awfully fiddly to setup on a cluster and their adoption would create tensions with tbg. (There are, of course, some low-hanging fruits like the adoption of the directory structure and formatter, e.g., in pre-commit.)

This issue is supposed to spark a discussion and trigger a conscious decision about such things. It is by no means intended as a suggestion to implement the above ideas. Involved in this discussion should probably be at least @psychocoderHPC, @PrometheusPi, @JessicaTiebel.

JessicaTiebel commented 2 months ago

Thanks for looking into the Snakemake workflow and giving input.

For 1.: To be honest I wasn't aware of these things. Having a code quality checker, a formatter and standardised directory structure sounds like a very good idea and on the first glance also doesn't seem to be super much work. Thanks for bringing this up.

For 2.: I am more aware of the issues you mention here. I agree that using the batch system abstraction has a lot of advantages and makes transition between different clusters much easier. This would be a huge benefit, but also is a lot of work (also the reason I didn't do it initially). I am not sure if it is possible to abstract the tbg in a way that it is also flexible enough to work with different clusters. I am definitely open for discussions regarding this.