cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
64 stars 268 forks source link

Variance extractor #2543

Closed ctoennis closed 2 months ago

ctoennis commented 5 months ago

This pull request is to add a new ImageExtractor subclass to generate images from the variance of waveforms that will be needed for the pointing calibration using stars. This request is related to issue #2542 .

kosack commented 5 months ago

If I understand correctly, what you are basically doing here is computing an event-wise pedestal variance. In the data model (see the original top-level data model document), there are a few places you can put pedestals:

So I think what you are computing is the first one, and you should store it as pedestal information: It is in fact a second measurement of the pedestal and its variance, and could be compared to the other methods (even if the use case you are describing here is for doing star tracking, star tracking is computed from the pedestal variances, which can be over many time scales)

fwerner commented 5 months ago

In addition to what has been said I'd suggest to clarify the purpose of this extractor: it is calculating the waveform sample variance, which should be reflected in the name/docstring to communicate that it has a complicated relationship to charge.

ctoennis commented 5 months ago

The thing we want to calculate is the variance of the waveforms specifically of interleaved pedestal events and not the variance of the pedestal in units of charge.

@kosack : What we get is a sort of image based on the variance of the waveform in each pixel. What we get is not a pedestal value, so putting it in a pedestal container is not the best option i think.

I think it is best to write a container appropriate for these variance images similar to the DL1CameraContainer

ctoennis commented 5 months ago

I just added a custom continer for the variance images. I also compared the variance calculation using just a plain variance and a variance of integrated chunks of the waveforms using the FixedWindowSum extractor in the plot below. comparemethod

The x axis shows the number of images used in calculating an average variance value that is shown on the y axis.

Though the methods are generally statistically equivalent the average variance becomes stable with fewer images when using a simple variance over the variance from integrated chunks.

maxnoe commented 5 months ago

Though the methods are generally statistically equivalent the average variance becomes stable with fewer images when using a simple variance over the variance from integrated chunks.

That's not surprising, since you have just more samples. But you change of what you compute the variance

ctoennis commented 4 months ago

Hi all, can we restart the discussion here? I made a small update to the extractor to contain timing information that will later be needed in the corresponding subclass of StatisticsExtractor that i am working on with Tjark.

There was a decision needed as to where the variance images will be attached to. Should this find a place somewhere in dl1.event.telescope.pedestal or something similar?

maxnoe commented 4 months ago

Hi @ctoennis the lint step in the CI is failing due to some typos and wrong formatting.

To check locallly, make sure to install the pre-commit hooks: https://ctapipe.readthedocs.io/en/latest/developer-guide/getting-started.html#installing-ctapipe-in-development-mode

ctao-dpps-sonarqube[bot] commented 4 months ago

Passed

Analysis Details

1 Issue

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 4 months ago

Passed

Analysis Details

1 Issue

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctoennis commented 4 months ago

i fixed all the isses with the spelling, code checks etc.

ctoennis commented 2 months ago

i can remove trigger time. But i am not sure how to proceed on the difference to the ImageExtractor API. I could make it that the variance image is stored in a DL1ImageContainer and that it complies with the API or make it a separate class that doesn't inherit from the ImageExtractor.

I think what i generate here is a type of image as i try to look for stars in the camera FOV with them. But if it makes more sense to you it can be separate from the ImageExtractors.

maxnoe commented 2 months ago

The question is if it is possible to adhere to the API that is expected from the image extractors: if yes, no problem, let it be an image extractor. But that means for example the call above should work and correctly write out the data to the HDF5 file.

If you need the different API, it should be its own thing and we need to think about how to integrate it into the process tool for example.

kosack commented 2 months ago

The question is if it is possible to adhere to the API that is expected from the image extractors: if yes, no problem, let it be an image extractor. But that means for example the call above should work and correctly write out the data to the HDF5 file.

I think what this class does is not the same as an ImageExtractor, or am I missing something? ImageExtractors take a "movie" (a set of waveforms per pixel for one event) for a single event and convert them to set of time-independent images that must include the peak time and integral over a window, but can also optionally inlcude any other parameters of the waveform like the pulse width or variabce. Generally, there is only one ImageExtractor that is run at a during processing. That doesn't seem to be what you have here.

The idea of the ImageExtractors was that low-level routines like those used to compute peak time or integral are implemeted as basic functions taking numpy inpus, and the ImageExtractor composes several of those methods into one set of algorithms. So what I would expect here is more like a function to compute the variance, and then that function is added to one or more of the existing ImageExtractors, perhaps as an option. So that then the chosen ImageExtractor generates the required peak time and integral, but also the variance

maxnoe commented 2 months ago

I think what this class does is not the same as an ImageExtractor, or am I missing something?

You could think of this as an ImageExtractor that uses the variance of each waveform to produce the image and doesn't produce a peak_time.

mexanick commented 2 months ago

I believe, it is an ImageExtractor, the __call__() signature is of course wrong here. It should copy the one of the base extractor:

@abstractmethod
    def __call__(
        self, waveforms, tel_id, selected_gain_channel, broken_pixels
    ) -> DL1CameraContainer:

Only waveforms are then used, the other parameters are simply ignored.

The only issue, that has to be resolved, is the return type. DL1CameraContainer in its current shape contains a number of fields that are not needed in this case (actually, everything except image).

I'd propose to add an intermediate container:

class CameraImageContainer(Container):
    """
    Storage of output of generic camera images (any reduction of waveforms).
    """

    image = Field(
        None,
        "Numpy array of camera image, after waveform extraction."
        "Shape: (n_pixel) if n_channels is 1 or data is gain selected"
        "else: (n_channels, n_pixel)",
    )

class DL1CameraContainer(CameraImageContainer):
    """
    Storage of output of camera calibration e.g the final calibrated
    image in intensity units and the pulse time.
    """
    peak_time = Field(
        None,
        "Numpy array containing position of the peak of the pulse as determined by "
        "the extractor."
        "Shape: (n_pixel) if n_channels is 1 or data is gain selected"
        "else: (n_channels, n_pixel)",
    )
    # rest of the DL1CameraContainer fields

class VarianceContainer(CameraImageContainer):
    """
    Storage of output of camera variance image.

    Stores the variance of waveform in each pixel, composed as an image.
    """
    variance_method = Field(
        VarianceType.SIMPLE,
        "Method by which the variance was calculated"
        "This can either be a plain variance"
        "or a variance of integrated samples",
        type=VarianceType,
    )
    # I don't think you need anything else here. The trigger time you extract from elsewhere in the event tree.

and change the return type of base ImageExtractor to CameraImageContainer

maxnoe commented 2 months ago

If it's just that the VarianceExtractor doesn't provide a peak_time, I think that's fine, just leave it None (the default value).

Adding an additional field to the DL1CameraContainer that's an "image_type" sounds like a good idea.

I would avoid having extractors that have different return types.

mexanick commented 2 months ago

Following a private discussion with @maxnoe , the new Container class is not needed. @ctoennis just use the DL1CameraContainer as a return type, setting the extractor type as a container meta keyword. You only need to fill Image field and leave all others as defaults (None). In this case, the serialization will be done correctly and only the image that you need will be stored. I.e. something like this (assuming you change VarianceType enum to WAVEFORM):

def __call__(
        self, waveforms, tel_id, selected_gain_channel, broken_pixels
    ) -> DL1CameraContainer:
    container = DL1CameraContainer(image=np.nanvar(waveforms, dtype="float32", axis=2))
    container.meta["ExtractionMethod"] = str(VarianceType.WAVEFORM)
    return container
maxnoe commented 2 months ago

Since this is also somewhat connected to #1243

I.e. we could introduce a new class e.g. WaveformProperties that runs after the CameraCalibrator and computes stuff like variance, FWHM, etc.

mexanick commented 2 months ago

Since this is also somewhat connected to #1243

I.e. we could introduce a new class e.g. WaveformProperties that runs after the CameraCalibrator and computes stuff like variance, FWHM, etc.

While this might be a good idea, I'd save it for a separate refactoring PR and merge the current (small) addition as is.

ctao-dpps-sonarqube[bot] commented 2 months ago

Passed

Analysis Details

0 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctoennis commented 2 months ago

So, is this request then good to merge?

maxnoe commented 2 months ago

@ctoennis From my side yes, but we require two approvals and @kosack is on vacation right now. Since he had comments / requests for changes before, I'd like to wait for his approval here.

kosack commented 2 months ago

I'll take a look tomorrow or today hopefully! Might take a bit of time since there are a lot of changes.

ctoennis commented 2 months ago

i added a test to check that an actual variance is being calculated.

ctao-dpps-sonarqube[bot] commented 2 months ago

Passed

Analysis Details

0 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube