Closed Sheshuk closed 2 years ago
Created using
%%timeit
generate_fluence('models/Bollig_2016/s11.2c','Bollig_2016','NoTransformation',d=10)
and
%%timeit
generate_time_series('models/Bollig_2016/s11.2c','Bollig_2016','NoTransformation',d=10)
branch | fluence | time_series |
---|---|---|
main 4630b45892952ff478e6e479a6444f2894862220 | 19.5 s ± 108 ms | 5.67 s ± 24.9 ms |
Sheshuk/refactor_fluence https://github.com/SNEWS2/snewpy/pull/152/commits/d9eebf637bf52bee89e32f5fd137c70e893bfa3b | 1.54 s ± 19 ms | 906 ms ± 7.7 ms |
I compared the files, generated at the previous step: fluence_new.txt fluence_main.txt
with script:
f0 = np.loadtxt('fluence_main.txt')
f1 = np.loadtxt('fluence_new.txt')
#compute relative diff
df = (f0-f1)/f0
#skip the first row (E=~0) and first column (energy)
df = df[1:,1:]
E = f0[1:,0]*1e3
#plot
plt.plot(E,df)
plt.xlabel('E, MeV')
plt.ylabel(f'$(\Phi_1-\Phi_0)/\Phi_0$')
plt.show()
(Edit: lines correspond to different flavors) The relative difference is below 3e-4. It can be explained by two facts:
astropy.units
instead of hardcoded values like here: https://github.com/SNEWS2/snewpy/blob/bd468443a41ff4461822e43fdc12956f3ce4fb7b/python/snewpy/snowglobes.py#L122Overall I think that now the result should be slightly more accurate.
@evanoconnor @jpkneller can someone of you, as people who authored the initial to_snowglobes
routines, please cross-check my refactoring? This branch is based on #138, but it is a separate change, so I decided to keep it in a separate PR.
I’ve started to look at this PR today, after the request at today’s telecon. A couple of comments:
Flux
class appears to be designed to work with an arbitrary number of dimensions; but I’m not sure when we would use any dimensions other than flavour × energy × time? If we don’t actually use this, I wonder whether less general code might be easier to understand?t
be a vector, moving some repeating code to functions)—maybe it would help to split those out into separate PRs?I’d be happy to review those quickly, so others have an easier time reviewing this PR!Thanks, Jost!
The
Flux
class appears to be designed to work with an arbitrary number of dimensions; but I’m not sure when we would use any dimensions other than flavour × energy × time? If we don’t actually use this, I wonder whether less general code might be easier to understand?
It's done for the cases where we integrate over energy or time, or select a specific flavour - that way we have the same Flux object. Having separate objects for flavor*time
, flavor*energy
, time*energy
and flavor*time*energy
would have a lot of repeating code.
I agree that general code is not very transparent, but to me it seems a lesser evil (as far as the user interface is clear) than having a zoo of classes doing the same thing. What do you think?
Also I will add unit tests to make sure all the functions work as intended.
I don’t think I feel comfortable reviewing the whole refactoring; Jim or Evan are probably better able to do that. However, I can see that there are a couple of elements to this PR that are independent of the main refactoring (e.g. binning in .glb file, letting
t
be a vector, moving some repeating code to functions)—maybe it would help to split those out into separate PRs?I’d be happy to review those quickly, so others have an easier time reviewing this PR!
I totally agree, and will try to split this PR.
Completely agree that having a zoo of classes would be terrible!
On the other hand, having a single class that could have an arbitrary number of axes in an arbitrary order (or with an arbitrary string as name) has a few downsides:
Flux
object could be a number of different physical quantities: an actual flux (time × energy), a collection of multiple different fluxes (flavor × time × energy), a fluence (energy) or collection of different fluences, …Flux
objects may make some assumptions (e.g. hard-coded axis names that must exist, or a minimum number of dimensions) and it’s up to the developer to keep track of what structure each specific Flux
object has and what methods can be applied to it.Think about the problem very naïvely: What if a Flux
were simply a 2D array (hard-coded to be time × energy
)? Integrating over one dimension would still return a 2D array, which simply has length 1 along that axis; and a collection of Flux
es for different flavours would be a dict[Flavor, Flux]
.
We’d lose the extensibility to add extra axes for other quantities (though it sounds like we don’t expect to use others beyond flavor, time and energy, anyway), and may need a few extra square brackets to access elements of a Flux
if we don’t squeeze()
it down to 1D after integration; but I think we’d gain a lot of clarity (and could delete some of the helper functions that are needed to make the code very general).
Unless there are any implementation issues I’m missing right now, I think that would be a fairly good trade-off to make. What do you think?
Think about the problem very naïvely: What if a
Flux
were simply a 2D array (hard-coded to betime × energy
)? Integrating over one dimension would still return a 2D array, which simply has length 1 along that axis; and a collection ofFlux
es for different flavours would be adict[Flavor, Flux]
.
This would have two additional downsides:
dict[Flavor, Flux]
requires looping over the flavors again and again.
While having flavor as an array dimension, like Flavor * time * Energy
, it's easy and natural to apply the flavor transformations as a matrix operation. This would improve the
transformed_spectra = initial_spectra @ flavor_xform
instead of what we have in https://github.com/SNEWS2/snewpy/blob/f9e7556ddc3367f2e1845525b9e3c34fe4da9bec/python/snewpy/models/base.py#L145-L159If you have specific concerns about clarity, please point me to the unclear places, so we can have a less general discussion.
If your concern is about how to test this general code, I plan to use hypothesis package, which does a great job of generating test suits given the general rule.
Think about the problem very naïvely: What if a
Flux
were simply a 2D array (hard-coded to betime × energy
)? Integrating over one dimension would still return a 2D array, which simply has length 1 along that axis; and a collection ofFlux
es for different flavours would be adict[Flavor, Flux]
.This would have two additional downsides:
- You'll need to pass the values for the dimensions along with this 2D array - they're needed almost everywhere you use the flux. And in my solution they are packed inside the Flux object
Sorry, I should have worded that better: Flux
would still be a class; but self.array
would be guaranteed to be a 2D array. So we’d still get the same encapsulation; we’d just be able to cut most of the code that’s necessary to support an arbitrary number of dimensions.
- Using
dict[Flavor, Flux]
requires looping over the flavors again and again. While having flavor as an array dimension, likeFlavor * time * Energy
, it's easy and natural to apply the flavor transformations as a matrix operation. This would improve the
- performance: vector operations instead of python looping,
It is a loop over 4 flavours while we’d still benefit from vector operations on the (potentially) thousands of entries in the array. So we should still get almost all the performance benefit. If merging the flavours into a single array were responsible for most of the performance improvement achieved in this PR then yes, I’d bite the bullet and accept it. But if we can get almost all the benefit without it, then I would prefer to sacrifice a little bit of performance for much clearer code.
- flexibility: several flavor transformations can be applied sequentially
Wouldn’t it be more efficient to combine all transformations first, and then apply the resulting matrix to the data? Rather than apply each transformations to the large array individually.
- code quality: something like
transformed_spectra = initial_spectra @ flavor_xform
instead of what we have in https://github.com/SNEWS2/snewpy/blob/f9e7556ddc3367f2e1845525b9e3c34fe4da9bec/python/snewpy/models/base.py#L145-L159
Agree that a single line would be nicer; but having one line per flavour is still fairly short and very easy to understand.
If you have specific concerns about clarity, please point me to the unclear places, so we can have a less general discussion.
I have two concrete concerns:
So, in practice I would suggest the following changes:
FluxCollection
class that combines multiple Flux
objects corresponding to different neutrino flavours.
dict[Flavor, Flux]
(or perhaps some other data type that has better performance characteristics while keeping the conceptual separation between an individual flux and a collection of fluxes)to_snowglobes()
and _save_to_snowglobes()
functions from Flux
to this new FluxCollection
, since they are only applicable to objects that contain fluxes for all flavoursbase.py
to a __matmul__()
function in this class, so higher-level code can apply a FlavorTransformation
in a single line as you suggest above.Flux
class should be fixed to 2D (time × energy)
Flux
object), we could have a simple two-element IntEnum
. In addition to the usual advantages of enums over strings for this use case, that would also remove the need for the three get_axis*
helper functions.Flux.squeeze()
function.If your concern is about how to test this general code, I plan to use hypothesis package, which does a great job of generating test suits given the general rule.
Oh, that looks very cool!
Before we get too bogged down in details of the Flux
class, I should add that the bulk of this PR looks really good to me!
Removing duplicate code from the get_time_series()
and get_fluence()
functions is really nice; and moving much of that (particularly the distance calculation) into a Flux
class is conceptually cleaner. Letting t
in get_initial_spectra()
be a vector as well is both more consistent and a huge performance benefit, so very nice as well!
(It’s easy in these code discussions to focus on the bits we disagree with; so we should occasionally take time to highlight all the things we agree with and think are great! 🤝)
I'm sorry to continue this discussion, but I can't agree with you on this.
Wouldn’t it be more efficient to combine all transformations first, and then apply the resulting matrix to the data? Rather than apply each transformations to the large array individually.
Nope, as you can see our flavor transformations are (in most general case) a function of the energy and time. So it means working with the same huge (flavor1 flavor2 energy * time ) arrays.
Agree that a single line would be nicer; but having one line per flavour is still fairly short and very easy to understand.
As physicists we use matrix multiplication to describe flavor transformation. I don't understand how treating the flavors separately would be simpler for understanding.
- A flux, a fluence and a collection of fluxes in different flavours are physically very different things
Okay, maybe a "Flux" is not the best name for this class. It's an N-dimensional data table with definite axes. The physical meaning is kept by self.array.unit
, so it's easy to determine how it should be treated - integrated, multiplied etc. to get physical values.
- Supporting an arbitrary number of dimensions, when we plan to use only time and energy (and perhaps flavour—see 1.), makes the code more complex than it needs to be.
Thought about having additional dimensions: some models have angular dependency, progenitor mass and other additional parameters. In my solution all these parameters can be extra dimensions, so we can make fast vectorized calculations, instead of having several python loops, if one wants to scan these parameters. Also the flavor transformations will work on arbitrary array shape, because we can easily determine the "Flavor" axis, and all the rest dimensions are treated automatically.
I agree that the class interface can be improved (I'll get rid of 2 of 3 get_axis_*
methods). And I agree that it's important to keep simple, but you're talking about dict[Flavor, array]
vs array
simplicity - and I can't agree to that.
If we stick to using python loops, we'll eventually have to refactor everything again, to optimize loops over other parameters - as I did here for the time.
Sorry I'm late to the discussion - I had a couple of important deadlines this week.
In hindsight breaking the equations up by neutrino flavor was a mistake and it would have been better to keep the flavor as an array dimension. As the flavor transformations become more complex (3 flavour, time and energy dependent numerical calculations) , I think we will find that breaking it up by flavor really hurts the performance because oftentimes all the transition probabilities can be computed simultaneously. The solution we are considering is to add a get_probabilities method to the flavor_transformation classes where we return a matrix that converts the array of initial flavor spectra into the array of flavor spectra at the detector. The 1/(4 pi d^2) is not included. We'll keep the prob_ee etc for backwards comparability for now but all those will do is pick out elements from the matrix.
But I don't see a need to go beyond the three dimensions of flavor x energy x time. It is true a user may be looping over models but I don't like having a model number as an array dimension because it's meaningless. Finally, I'm wary of making the SNEWPY learning curve too steep if we make things too general. My take has always been that Python is useful because it's easy to do simple things quickly, not because it's efficient. If I want efficient code then I write it in C.
If we stick to using python loops, we'll eventually have to refactor everything again, to optimize loops over other parameters - as I did here for the time.
The problem with this argument is that time is fundamentally different from other parameters: There are potentially thousands of time bins in a given model (12,380 for the Bollig_2016 s11.2 progenitor), but there are only ever four flavours (six, if we distinguish mu and tau at some point in the future).
The performance comparison in your post above are impressive (6–13× improvement). However, I did some more detailed benchmarking on generate_time_series('models/Bollig_2016/s11.2c', 'Bollig_2016', 'NoTransformation', d=10)
and here’s how the remaining run time (after your changes) is spent:
Total | 2.2 s |
---|---|
setup | 1.9 s |
get_transformed_flux |
0.005 s |
write tarball | 0.3 s |
Almost 90% of the remaining running time is spent on setup (reading model data from the files, etc.), the remaining ~10% are spent writing to the tarball and the actual flux calculations (including the non-vectorised calculation of flavour components in base.py
which you cite above) make up less than 0.3% of the total running time. Of course, exact numbers may vary for different models, etc.; but even if it’s closer to 1%, it looks like we will not gain any noticeable benefit whatsoever from vectorising over the flavours (or any other optimisation of the flux calculations for additional parameters).
Sorry, I decided to step away from this task for a while to give it some more thought and to stay constructive. However I don't see any solution.
@JostMigenda you consider one my argument, while completely ignoring all the rest.
The benchmark you made is very nice, however is quite synthetic - it considers no flavor transformation. And in this simple case we really have very small benefit from vectorizing, but as I said, in case of more complicated flavour transformations we get multiplication of many ndarrays, and thousands of prob_ee
calls. I don't see why on Earth this shouldn't be done via matrix multiplication.
I don't understand why do you need to compare with setup and output time. These are separate routes, and the workflow can be changed (i.e. if user needs to do some calculations with the transformed flux, no need to write it and read again).
This benchmark basically shows that our setup and output routines could be better.
I don't see any benefit at all in sticking to the dicts instead of building special class. Maybe using the built-in class gives some psychological comfort, but it smears the semantics of the code: a dict is much more general purpose object, than any Flux
(or whatever name is more appropriate) class with exact physical meaning and operations.
Again, if you doubt the correctness of my implementation, that's what the tests are for.
Talking about complexity to user: in the current setup, the details of how the flux is represented are hidden from the user, if he sticks to our generate_fluence
methods etc. My refactoring here just makes things more decoupled in the internal implementations, while we can have any ridiculously simple user interface we want. So I believe that my changes don't make SNEWPY learning curve steeper.
I am grateful for the thorough feedback. But I don't think I can degrade the solution to fixed dimensions and dicts, as you suggest. Since this seems to be a really blocking point, I suggest to close this PR (or passed to someone who can do things better)
Thank you for the update, @Sheshuk. I’m sorry that this discussion has not made much progress and I understand your frustration. Perhaps it is better to pause this for now.
As noted earlier, there were significant changes in this PR that we all agree are good and useful; so we should at least include those. If you would prefer not to work on this for a while, I can put that on my to-do list. (Advance warning: I won’t be able to do much work on SNEWPY for the next several weeks, so probably wouldn’t get to this before March/April. If anyone else wants to look into this in the meantime, please go ahead!)
Regarding the Flux
class itself, I would suggest tabling that for now and revisiting it once those more complex flavour transformations Jim mentioned are implemented and we can benchmark things and see where optimizations are needed. If the timeline works out, we could schedule some time at the hackathon this summer for this—perhaps discussing it in person (or at least on Zoom) is a more promising way to resolve this than in writing?
Making the
generate_fluence
andgenerate_time_series
functions work faster:get_initial_spectra
work with a vector oft
SupernovaModel.get_oscillated_flux
, so that all the conversion factors stay in one place.Flux
class, which is basically a container fornumpy.ndarray
altogether with user-defined axes. It can be used to slice or integrate the flux along any axis.