Regression in NeighborPeakWindowSum

kosack commented 1 month ago

Describe the bug While testing the datapipe benchmark framework, we (@Tobychev and myself) found that files processed with ctapipe-0.21 and those processed with ctapipe-0.20 show a much larger charge bias at low intensities when using the NeighborPeakWindowSum charge extractor. This seems to be a regression

To Reproduce

process the same file with ctapipe-0.20 and ctapipe-0.21 and compare the pixel charge distributions.

Below we show the results for local and neighbor peak methods compared to the two ctapipe versions.

Expected behavior

No change

Supporting information

Additional context

kosack commented 1 month ago

My quick guess is it could be related to #2529 which is the most recent change

maxnoe commented 1 month ago

My quick guess is it could be related to https://github.com/cta-observatory/ctapipe/pull/2529 which is the most recent change

I think it's the only change between 0.20 and 0.21 that could do something like this

maxnoe commented 1 month ago

Could you say a bit more on how you generated this?

I made a simple test and I get binary identical images for 0.19.3, 0.20.0 and 0.21.1 using this example script:

import numpy as np

from ctapipe.calib import CameraCalibrator
from ctapipe.io import EventSource
from ctapipe import __version__

with EventSource("dataset://gamma_prod5.simtel.zst") as source:
    event = next(iter(source))
    subarray = source.subarray

calib = CameraCalibrator(subarray, image_extractor_type="NeighborPeakWindowSum")
calib(event)

tel, dl1 = next(iter(event.dl1.tel.items()))
np.save(f"image_{__version__}.npy", dl1.image)

In [1]: import numpy as np

In [2]: img19 = np.load("./image_0.19.3.npy")

In [3]: img21 = np.load("./image_0.21.1.npy")

In [4]: img19
Out[4]: 
array([-1.0826262 , -0.25593492,  0.07813098, ..., -3.2625198 ,
        0.95545363, -0.35306644], dtype=float32)

In [5]: img21
Out[5]: 
array([-1.0826262 , -0.25593492,  0.07813098, ..., -3.2625198 ,
        0.95545363, -0.35306644], dtype=float32)

In [6]: img21 == img19
Out[6]: array([ True,  True,  True, ...,  True,  True,  True])

In [7]: np.all(img21 == img19)
Out[8]: True

kosack commented 1 month ago

You're right. I htink this must be some bug in the benchmark code. I just opened both input files and did the following:

from ctapipe.io import TableLoader
import numpy as np

loader20 = TableLoader("ctapipe-0.20.0/gamma_20deg_0deg_run1000___cta-prod5b-lapalma_desert-2158m-LaPalma-dark_cone10_neighborpeak.DL1.h5")
loader21 = TableLoader("ctapipe-0.21.dev/gamma_20deg_0deg_run1000___cta-prod5b-lapalma_desert-2158m-LaPalma-dark_cone10_neighborpeak.DL1.h5")

im20 = loader20.read_telescope_events_by_type("LST_LST_LSTCam", dl1_images=True)['LST_LST_LSTCam']['image']
im21 = loader21.read_telescope_events_by_type("LST_LST_LSTCam", dl1_images=True)['LST_LST_LSTCam']['image']

np.all((im20-im21) == 0)

So I will close this and try to debug what is going wrong. It's quite odd that the binned plots look so different, when the input data are identical!

Anyhow, false alarm!

kosack commented 1 month ago

@Tobychev any idea how the results could look different with identical input data? I also notice in addition that the LocalPeak and NeighborPeak results are identical, which also makes no sense. I even checked that I didn't accidentally transpose the inputs.

kosack commented 1 month ago

For info: due to some strange way the benchmark system names the plots, the plot names got scrambled. So the actual lines were correct, but the labels were incorrect. So really the differences shown are just the difference between the local and neighbor method, but both ctapipe versions agree perfectly.

Therefore this is not an issue and is closed

cta-observatory / ctapipe

Regression in NeighborPeakWindowSum #2559