Open calispac opened 5 years ago
I was always wondering why we are using custom histogram classes, while numpy offers both .histogram and .histogram2d (even .histogramdd) for histogramming. I might be not fully aware of the full functionality of our solution, but is it possible to just use numpy?
Numpy histograms does not offer histograming for data having shape = (m,n) For instance we usually have per pixel histograms:
bins = np.arange(2**12)
hist = Histogram1D(data_shape=(n_pixels,), bins_edges=bins) # n_pixels 1D histograms
for event in events:
waveform = event.waveform # (n_pixels, n_samples)
hist.fill(waveform)
Using numpy.histogram One would need to do:
bins = np.arange(2**12)
hist = np.zeros((n_pixels, len(bins))
for event in events:
waveform = event.waveform # (n_pixels, n_samples)
for pix in range(len(waveform)):
hist[pix] += np.histogram(waveform[pix], bins=bins)
The second for loop is quiet costly. The classes would not be useful if np.histogram
had an axis=
argument. If you give np.histogram
a >2D np.array
it will first flatten the array to a 1D np.array
and then histogram all the values therefore you loose the pixel information.
Also with the Histogram1D
class we built a HistogramFitter
class to fit histograms.
I hope this clarifies
In such a case, I agree that hist2d.py should be merged within https://github.com/cta-sst-1m/histogram
Actually we are using numpy histograms, at least in this implementation in utils.hist2d.Histgram2D
... the class Histogram2D
is merely a wrapper or a helper around the more ugly parts of filling a numpy histogram with multiple events.
https://github.com/cta-sst-1m/digicampipe/blob/master/digicampipe/utils/hist2d.py#L12
That is ... if I remember correctly... I totally omitted all comments which might explain what this class is good for :-(
It is used to make pulse template. Also it includes a Histogram2dChunked class which use some kind of buffer to make the filling much faster.
Currently we are using two histogram classes :
I think we should merge the two in the same repo (either the histogram repo or digicampipe)
To my opinion it should be merged within https://github.com/cta-sst-1m/histogram since these functionalities are not in principle useful for
digicampipe
only.What do you think?