HSF / PyHEP.dev-workshops

PyHEP Developer workshops
https://indico.cern.ch/e/PyHEP2023.dev
BSD 3-Clause "New" or "Revised" License
9 stars 1 forks source link

Analysis code readability and performance #3

Closed jpivarski closed 9 months ago

jpivarski commented 1 year ago

Expressing small-scale analysis steps intuitively, to reduce cognitive burden, while also scaling to large datasets, using vectorization, automatic differentiation, JIT-compilation, and C++/Julia/Rust interfaces where possible.

gordonwatts commented 1 year ago

This feels a lot like User Experience #9, doesn't it?

alexander-held commented 1 year ago

3 to me is bit more low-level: the code people write using e.g. awkward (how to have a computationally efficient and readable analysis logic implemented), while #9 is more high-level: how well do the services / libraries being used integrate, how easy are they to use, are there pieces of functionality missing.

@jpivarski mentions in https://github.com/HSF/PyHEP.dev-workshops/issues/9#issuecomment-1611907200 also that #3 is also about performance, which is less of a focus for #9.

jpivarski commented 1 year ago

I wanted to get people who are interested in code readability and people who are interested in performance into the same discussion.

3 was supposed to be about the "programming in the small" (line by line expressing individual formulas/small-scale steps in the analysis) and #4 was supposed to be about "programming in the large" (fitting services and libraries together). So if #9 is about

how well do the services / libraries being used integrate, how easy are they to use, are there pieces of functionality missing

then I intended #4 for that.

But instead of people fitting into the original set of categories, merging or splitting them as needed, most people are creating whole new categories. (The opposite happened at Scientific-Python; sociology is fun!) But we can go with this flow instead: the thing I wanted to avoid was silence/no interaction.

After this first wave, I'm going to bring the less talkative participants into the discussions as they exist at that time—I'll suggest topics based on what they wrote in their "why I'm interested in this workshop" boxes in the registration form.

alexander-held commented 1 year ago

4 to me sounded not so much concerned with tools but higher level orchestration instead. Something like "I cannot plot my histograms easily" would be a #9 topic, but not a #4 to me (and certainly not a #3).

vgvassilev commented 1 year ago

+1

sudo-panda commented 1 year ago

+1

pfackeldey commented 1 year ago

+1

mattbellis commented 1 year ago

+1 I'm very intrigued by @jpivarski initial statement about the "small-scale analysis steps". I've often wished for a more Unix-philosophy to many of our analysis tools where there are small packages that do one thing and they do it well, and then you pipe output from one to another.

ROOT tried to keep everything in one environment which made it difficult at time to interface with other tools. My perspective is that the community has moved in a different direction and we now have smaller, piecemeal tools for the different steps and I think that's for the best.

I'm not sure if my interests are in line with or outside of the scope of "Analysis code readability" but I would love a world in which each step of an analysis was well defined and users could use whatever works for them and whatever makes sense for how they think about things. And then they could just pipe that output to a different step of the analysis. There are challenges with this approach obviously.

Whether or not this is in the scope of this part of the workshop, I'm glad to see this broader discussion happening!

ianna commented 1 year ago

+1

kjvbrt commented 1 year ago

+1

henryiii commented 1 year ago

+1

alexander-held commented 1 year ago

We discussed an example of readability a bit in the context of https://gist.github.com/alexander-held/a9ff4928fe33e8e0c09dd27fbdfd24d9. An example of ideas of potential interface improvements are in this version of @gordonwatts, including shorthands for indexing and string axes: https://gist.github.com/gordonwatts/87d29b9e1dd13f0958968cd194e7b929.

Moelf commented 1 year ago

+1

on the most radical side of the spectrum, I guess "code readability and performance" is basically "what language to use".

I'm happy to provide perspective on Julia and metaprogramming

Moelf commented 1 year ago

I'm wondering if there's any appetite to turn/expand: https://github.com/iris-hep/adl-benchmarks-index into some readability metric -- it's very difficult to draw a single conclusion from the metrics like ranking them in some order, because there are many aspects of readability.

But I think we should be able to report some metric out of this collection, for example in the Julia repo I compiled numbers for:

length (in chars) of function body after stripping spaces and line breaking, excluding plots and file opening etc.

we can probably come up with a few more?