HSF / PyHEP.dev-workshops

PyHEP Developer workshops
https://indico.cern.ch/e/PyHEP2023.dev
BSD 3-Clause "New" or "Revised" License
9 stars 1 forks source link

Integration of workflow systems into physics analysis and the scientific python ecosystem #31

Open pfackeldey opened 4 months ago

pfackeldey commented 4 months ago

https://github.com/HSF/PyHEP.dev-workshops/issues/31#issuecomment-2269133847:

Typical HEP analysis (i.e. at the LHC) comprise a vast amount of steps with non-trivial dependencies between those. Here, one can use workflow tools, e.g. https://github.com/spotify/luigi, to describe & execute these steps and their dependencies. This is not directly related to the heavy batch processing that is typically done using e.g. Dask / HTCondor / Slurm as it represents only a subset of steps of a whole analysis.

eduardo-rodrigues commented 4 months ago

Interested 👍.

JonasEppelt commented 4 months ago

We (@AlexanderHeidelbach and I) recently took over the maintenance and development of b2luigi for the Belle II collaboration. Therefore we would be very interested in exchanging ideas and experiences on this topic and are looking for overlap or maybe opportunities for collaboration.

bfis commented 4 months ago

I'm interested in this topic, but with a particular focus on enabling greater flexibility & reusability (in the context of physics analyses) by addressing specific shortcomings in the underlying structures, in particular luigi's handling of parameters and dependencies. That means, I'm not particularly focused on the integration aspect, but rather the idea that what can/will be integrated should be iterated/improved upon before it is too late and to avoid unnecessary baggage.

ynikitenko commented 3 months ago

Dear topic starter,

it would be great if you could expand on what you mean with that topic.

I make a talk on an architectural framework for data analysis in Python, therefore I'm generally interested in this theme.

ynikitenko commented 3 months ago

It would be good to see some other examples of workflows and their comparisons (e.g. there are proposed discussions about dask, but there are also workload managers like Slurm).

pfackeldey commented 3 months ago

Dear @ynikitenko, Typical HEP analysis (i.e. at the LHC) comprise a vast amount of steps with non-trivial dependencies between those. Here, one can use workflow tools, e.g. https://github.com/spotify/luigi, to describe & execute these steps and their dependencies. This is not directly related to the heavy batch processing that is typically done using e.g. Dask / HTCondor / Slurm as it represents only a subset of steps of a whole analysis.

ynikitenko commented 3 months ago

Dear @pfackeldey , thank you for a nice example. Would you be so kind as to maybe adding it to the starting message for easier navigation?