cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
65 stars 269 forks source link

Add a modifiable unit-distance to the initialization of HillasParametersContainer #1413

Closed HealthyPear closed 3 years ago

HealthyPear commented 4 years ago

Context

In the framework of the pipeline, we should clean and parametrize all calibrated images and always save all information into DL1. This means that sometimes certain cleaned images will result in a failing parametrization which should be recorded. In protopipe I take care of this case by initializing a dummy ctapipe.containers.HillasParametersContainer and record it. Then I attach to each image a good_for_reco integer variable (0 or 1) to record if this image has been used for direction reconstruction (is this also in the ctapipe DL1 format? I don't remember...if not, I propose to add something similar) and to each shower an is_valid boolean variable to record if it has been correctly reconstructed (this is analog to ctapipe).

Now with the changes in PR #1408 , it is possible to clean and parametrize in the ctapipe.coordinates.TelescopeFrame, from which all distances in the parametrized image (pixel coordinates, width, and length) are in units of degrees.

Proposal and attempted solution

What I propose is to add a default attribute to the initialization of ctapipe.containers.HillasParametersContainer which defaults to meters, but that can be set to degrees, aka


from ctapipe.containers import HillasParametersContainer
hillas = HillasParametersContainer(unit=u.deg)

I attempted to do this in the following way,

class HillasParametersContainer(Container):

    def __init__(self, unit=u.m, **fields):
        super().__init__(**fields)
        self.unit = unit

        container_prefix = "hillas"

        intensity = Field(nan, "total intensity (size)")

        x = Field(nan * self.unit, "centroid x coordinate", unit=self.unit)
        y = Field(nan * self.unit, "centroid x coordinate", unit=self.unit)
        r = Field(nan * self.unit, "radial coordinate of centroid", unit=self.unit)
        phi = Field(nan * u.deg, "polar coordinate of centroid", unit=u.deg)

        length = Field(
            nan * self.unit, "standard deviation along the major-axis", unit=self.unit
        )
        width = Field(
            nan * self.unit, "standard spread along the minor-axis", unit=self.unit
        )
        psi = Field(nan * u.deg, "rotation angle of ellipse", unit=u.deg)

        skewness = Field(nan, "measure of the asymmetry")
        kurtosis = Field(nan, "measure of the tailedness")

but it doesn't work and it returns the following error,

AttributeError: 'HillasParametersContainer' object has no attribute 'unit'

which I thought was the entire point of my modification :)

Maybe I am missing something deep about the parent class ctapipe.core.Container? I say this because a simple code snippet works, e.g.

class Parent(object):
    def __init__(self):
        self.var = 1
a = Parent()
a.var

returns 1

class Child(Parent):
    def __init__(self, unit="meters", *args):
        super().__init__(*args)
        self.unit = unit
b = Child()
b.unit
b.var

returns "meters" and 1

c = Child(unit="degrees")
c.unit
c.var

returns "degrees" and 1

HealthyPear commented 4 years ago

Ah. It occurs to me that perhaps the problem is indeed how ctapipe.core.Container is built, or better how its metadata class works.

I can't add an attribute to a child container, can't I? Only the fields defined in the dictionary are written at instantiation and that's it?

LukasNickel commented 4 years ago

In protopipe I take care of this case by initializing a dummy ctapipe.containers.HillasParametersContainer and record it. Then I attach to each image a good_for_reco integer variable (0 or 1) to record if this image has been used for direction reconstruction (is this also in the ctapipe DL1 format? I don't remember...if not, I propose to add something similar) and to each shower an is_valid boolean variable to record if it has been correctly reconstructed (this is analog to ctapipe).

The stage-1 tool initializes an empty ImageParametersContainer (which, amongst others, includes a HillasParametersContainer) if the event can not be reconstructed based on some criteria. https://github.com/cta-observatory/ctapipe/blob/052fe3bde981432be455c00b5dfa70a0ebb8efec/ctapipe/tools/stage1.py#L457-L459 There are no additional validity variables, but you can just discard the nan events, which would match your is_valid. If you want to keep track of the events used for later stereoscopic analysis tasks, you can save them directly in the relevant container. For example: https://github.com/cta-observatory/ctapipe/blob/052fe3bde981432be455c00b5dfa70a0ebb8efec/ctapipe/containers.py#L528-L561

Regarding the unit <-> container part: I personally do not know if we even want the units in the container fields to be anything but constant, but there might be reason to avoid all sorts of conversions slowing us down. You could access the container fields after initializing the container instance to change the units that are written to the file, define new container classes or work on the main class (e.g. adding a units dict?). Maybe @kosack has an opinion on how to do this.

maxnoe commented 4 years ago

@HealthyPear In you example you are mixing and confusing class definition (what is in the body after class) with instance initialization (the __init__ method). This simply is not how containers work. Creating local variables in __init__ does nothing to that instance. You have to assign them to the instance: self.x = 5 instead of x = 5.

Fields are class members, so have to be declared in the class definition.

I see two possibilities here.

  1. Change Container / Field to allow what you propose. This is not straight forward and I don't see an easy way to do this and still have most of the nice properties of containers we currently have.

  2. Create a new container HillasParametersDegree or similar, that has units of degree. Then, the hillas_parameter function returns either HillasParameters or HillasParametersDegrees depending on the unit of pixel_x / pixel_y

HealthyPear commented 4 years ago

The stage-1 tool initializes an empty ImageParametersContainer (which, amongst others, includes a HillasParametersContainer) if the event can not be reconstructed based on some criteria.

OK that is good

There are no additional validity variables, but you can just discard the nan events, which would match your is_valid.

I don't think this does the same think though, because the is_valid variable is related to the shower, not to the image (at least in protopipe). You could have a correctly reconstructed shower which triggered e.g. 5 telescopes, of which only 3 returned useful images. So what I am proposing is to add something that provides an immediate (meaning "by eye") information on whether that single image was used.

If you want to keep track of the events used for later stereoscopic analysis tasks, you can save them directly in the relevant container.

Yes, I am doing this

Regarding the unit <-> container part: I personally do not know if we even want the units in the container fields to be anything but constant, but there might be reason to avoid all sorts of conversions slowing us down. You could access the container fields after initializing the container instance to change the units that are written to the file, define new container classes or work on the main class (e.g. adding a units dict?). Maybe @kosack has an opinion on how to do this.

Well I do not convert units in this case (I can't in fact - meters to degrees it's not covertible). It's more of an initialization thing

HealthyPear commented 4 years ago

@HealthyPear In you example you are mixing and confusing class definition (what is in the body after class) with instance initialization (the __init__ method). This simply is not how containers work. Creating local variables in __init__ does nothing to that instance. You have to assign them to the instance: self.x = 5 instead of x = 5.

Fields are class members, so have to be declared in the class definition.

I am a bit lost, what would be "x" in my case?

I see two possibilities here.

1. Change Container / Field to allow what you propose. This is not straight forward and I don't see an easy way to do this and still have most of the nice properties of containers we currently have.

Yes, I wanted to avoid this

2. Create a new container `HillasParametersDegree` or similar, that has units of degree.
   Then, the `hillas_parameter` function returns either `HillasParameters` or `HillasParametersDegrees` depending on the unit of pixel_x / pixel_y

This seems a fast and at least temporary solution.

I guess we could rename the current container as HillasParametersContainerCameraFrame and the new one HillasParametersContainerTelescopeFrame to avoid confusion

maxnoe commented 4 years ago

I am a bit lost, what would be "x" in my case?

Going back to your example, I add comments:

class HillasParametersContainer(Container):

     # Containers expect field declarations here, as class members
     # Thus, this container does not have any fields

    def __init__(self, unit=u.m, **fields):
        super().__init__(**fields)

        # this fails because containers allow only pre-declared fields to be set (via `__slots__`)
        # and this container has no field `unit`, it does not have any field.
        self.unit = unit

        # the following lines only declare local variables in this method and do not affect anything
        container_prefix = "hillas"

        intensity = Field(nan, "total intensity (size)")

        x = Field(nan * self.unit, "centroid x coordinate", unit=self.unit)
        y = Field(nan * self.unit, "centroid x coordinate", unit=self.unit)

Please look into the differences between class and instance variables and how each of them are set in python.

Well I do not convert units in this case (I can't in fact - meters to degrees it's not covertible).

Astropy has something called equivalencies to handle such cases, e.g. you can convert frequencies to wavelength with the u.spectral equivalence:

wavelength = 632 * u.nm
frequency = wavelength.to(u.Hz, equivalencies=u.spectral())
HealthyPear commented 4 years ago

Ah ok, now I understand better.

Then yes, unit should be a Field in order to be recognized by the container, but it doesn't make sense. I confirm that the most viable solution is to create a dedicated container for this.

If we agree then I would add this change directly to PR #1408 , since it is directly related to it.

It is not urgent for me (I needed to know also how to patch protopipe before the next release), so we can wait for other inputs if necessary.

HealthyPear commented 4 years ago

Astropy has something called equivalencies to handle such cases, e.g. you can convert frequencies to wavelength with the u.spectral equivalence:

wavelength = 632 * u.nm
frequency = wavelength.to(u.Hz, equivalencies=u.spectral())

Sorry I didn't see this part!

I didn't know about this feature, indeed there seems to be something very similar (if not the same thing).

Though I thought that containers are read-only. However, even if one managed to open the container and modify the units using this feature, I think that calling the equivalence method and modifying the container would take more computing resources than just define a dedicated container and instantiating it.

kosack commented 4 years ago

Why don't we just change the HillasParameterContainer's defaults to all deg and leave it at that? I don't see any reason to support meters anymore - in the next version we will use the new HillasReconstructor that does it in deg, and we should at the same time modify ctapipe-stage1 to do the same.

All of this conversation is to find a hacky way to support a deprecated way of doing things, I think!

maxnoe commented 4 years ago

I agree. We should just fully make the switch to calculating it in telescope frame.

However, than we need to discuss about cog names.

These should probably then be: cog_fov_lon, cog_fov_lat, instead of x / y.

kosack commented 4 years ago

Also, note that the current units in the Container definition are only ever checked if the user calls hillas_parameters.validate(), otherwise they are just metadata that are ignored, and are used as a suggestion. No automatic conversion is done or anything. The only case where they are important is that the first time the HDF5TableWriter gets an event, it will assign and check the units of the instance, so if the first event is "NaN" and you didn't explicitly set it to e.g. hillas.length=NaN*u.deg it would set up the output in meters.

maxnoe commented 4 years ago

Won't the hdf5 writer fail if subsequent events have the wrong units?

Though one would have to really mess up hard to create this situation (other than a wrong default in the container, which should be fixed now everywhere).

kosack commented 4 years ago

Won't the hdf5 writer fail if subsequent events have the wrong units?

I don't think so - it only checks on the first event, and assumes all subsequent events are the same. In principle it doesn't even look at the units in the FIeld, only at what is assigned in the first event (except for in .validate())

maxnoe commented 4 years ago

I don't think so - it only checks on the first event, and assumes all subsequent events are the same.

No, it calls to_value(UNIT_OF_FIRST_ROW), which will fail.

kosack commented 4 years ago

No, it calls to_value(UNIT_OF_FIRST_ROW), which will fail.

That must have been a new addition: I didn't do it, since that is very slow to do unit conversion every time. We can remove that. The idea was that units and array shapes should never change.

maxnoe commented 4 years ago

https://github.com/cta-observatory/ctapipe/blob/21c4c72747d751d756aebd5336e6c7e5fdd68838/ctapipe/io/hdf5tableio.py#L499-L500

I didn't do it, since that is very slow to do unit conversion every time. We can remove that. The idea was that units and array shapes should never change.

to_value is only slow if the requested unit is not the unit of the quantity.

I think this is the correct behaviour, as otherwise we might write invalid data.

kosack commented 4 years ago

Ah yes, it adds a col_transform of convert_and_strip_unit... I forgot. But then it's not really using the Container unit, but the unit of the first event, so I think it is still ok. But of course container.validate() will fail, I think.

maxnoe commented 4 years ago
In [5]: a = u.Quantity(np.random.normal(size=10000), u.m)

In [6]: %timeit a.to_value(u.m)
633 ns ± 4.22 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit a.to_value(u.cm)
7.62 µs ± 86.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
kosack commented 4 years ago

Actually, we could just remove the units from the HillasParametersContainer (leave them as None), and then it will all still work. Validate will then not check that the unit matches the first event, and both deg and meters will work properly, as long as you return even "blank" containers in those units (i.e. hillas.length = NaN*u.deg or hillas.length=NaN*u.m.)

I think that is the easiest solution: if units can be more than one thing, just don't specify them

maxnoe commented 4 years ago

We could also use a tuple of units if we want to check that is any of a set of allowed units.

maxnoe commented 3 years ago

As discussed here #1408, just making the units changeable is not enough, we also need other fields so they correspond correctly to the cordinate system axis names. closing.