cta-sst-1m / digicampipe

DigiCam pipeline based on ctapipe
GNU General Public License v3.0
3 stars 3 forks source link

1D and 2D histograming to be moved to cta-sst-1m/histogram? #253

Open calispac opened 5 years ago

calispac commented 5 years ago

Currently we are using two histogram classes :

  1. https://github.com/cta-sst-1m/histogram (1D histogram and fitting)
  2. https://github.com/cta-sst-1m/digicampipe/blob/master/digicampipe/utils/hist2d.py (2D histogram)

I think we should merge the two in the same repo (either the histogram repo or digicampipe)

To my opinion it should be merged within https://github.com/cta-sst-1m/histogram since these functionalities are not in principle useful for digicampipe only.

What do you think?

moderski commented 5 years ago

I was always wondering why we are using custom histogram classes, while numpy offers both .histogram and .histogram2d (even .histogramdd) for histogramming. I might be not fully aware of the full functionality of our solution, but is it possible to just use numpy?

calispac commented 5 years ago

Numpy histograms does not offer histograming for data having shape = (m,n) For instance we usually have per pixel histograms:

bins = np.arange(2**12)
hist = Histogram1D(data_shape=(n_pixels,), bins_edges=bins) # n_pixels 1D histograms
for event in events:
    waveform = event.waveform # (n_pixels, n_samples)
    hist.fill(waveform)

Using numpy.histogram One would need to do:

bins = np.arange(2**12)
hist = np.zeros((n_pixels, len(bins))
for event in events:
    waveform = event.waveform # (n_pixels, n_samples)
    for pix in range(len(waveform)):
       hist[pix] += np.histogram(waveform[pix], bins=bins)

The second for loop is quiet costly. The classes would not be useful if np.histogram had an axis= argument. If you give np.histogram a >2D np.array it will first flatten the array to a 1D np.array and then histogram all the values therefore you loose the pixel information.

Also with the Histogram1D class we built a HistogramFitter class to fit histograms.

I hope this clarifies

moderski commented 5 years ago

In such a case, I agree that hist2d.py should be merged within https://github.com/cta-sst-1m/histogram

dneise commented 5 years ago

Actually we are using numpy histograms, at least in this implementation in utils.hist2d.Histgram2D... the class Histogram2D is merely a wrapper or a helper around the more ugly parts of filling a numpy histogram with multiple events.

https://github.com/cta-sst-1m/digicampipe/blob/master/digicampipe/utils/hist2d.py#L12

That is ... if I remember correctly... I totally omitted all comments which might explain what this class is good for :-(

yrenier commented 5 years ago

It is used to make pulse template. Also it includes a Histogram2dChunked class which use some kind of buffer to make the filling much faster.