Necessary improvements to IRF interpolation (Handle PDFs properly, extrapolation)

maxnoe commented 3 years ago

In the current interpolation code, the amplitude of the PDF is interpolated. I think I saw a double bump structure in the energy dispersions shown by @jsitarek and checked what happens when interpolating between two gaussians where the mean and the std dev depends linearly on the interpolation variable. (E.g. mean and std get larger with a hidden variable z). This is I think the expected case for e.g. energy dispersion vs. zenith.

The results show the double bump structure and make it pretty clear that interpolating the height of the pdf is not really a good solution:

import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d

def expected_mean(z):
    return 5 + z / 10

def expected_std(z):
    return 1 + z / 30

x = np.linspace(0, 20, 1000)
zs = {
    z: norm(expected_mean(z), expected_std(z)).pdf(x)
    for z in [20, 40, 60]
}
y = np.array([zs[20], zs[60]]).T

f = interp1d([20, 60], y)

plt.figure(constrained_layout=True)
for z, pdf in zs.items():
    plt.plot(x, pdf, label=f'Truth, z={z}')

plt.plot(x, f(40), label='Interpolated z=40')
plt.legend()

Result:

norm_interpolation

So we should probably reconsider the approach for psf / edisp interpolation. We also should add a warning to the docs that the interpolation methods are experimental and not yet verified to work in real analyses.

jsitarek commented 3 years ago

Hi @maxnoe

The accuracy and the possible features will depend strongly on what we are interpolating. The current code is a pretty simple approach of amplitude interpolation, but I wanted first something simple that would not be dependent on assumptions, especially that at the time of the development we had no data to test it properly. Now that we have more MCs that were generated, I can look into more clever approaches. One possibility that I was thinking about is to compute the first and central second moments normalize the distributions to them, and then interpolate the values of those moments. This would work much better (actually even give exactly correct solution) in the case that you quoted, but a technical issue is that you need some base shape of the distribution (I could use the closest point in the grid for this).

I will have a look and let you know. Nevertheless, even if a more clever solution is implemented, I think it would be also good to keep an option to run with the original code.

moralejo commented 3 years ago

Hi @jsitarek,

Now that we have more MCs that were generated, I can look into more clever approaches. One possibility that I was thinking about is to compute the first and central second moments normalize the distributions to them, and then interpolate the values of those moments. This would work much better (actually even give exactly correct solution) in the case that you quoted, but a technical issue is that you need some base shape of the distribution (I could use the closest point in the grid for this).

can you elaborate more on this? Do you plan to amplitude-interpolate the normalized distributions, then obtain the final one by using the interpolated central moments? One has to keep in mind that the energy dispersion at low E will be quite asymmetric.

I think it is worth to have a look at existing studies, clearly we cannot be the only ones needing to interpolate PDFs which are non-gaussian. I found a couple of papers which deal with the topic (but had not time to look in any detail):

https://www.sciencedirect.com/science/article/pii/S0096300395002162 https://engineering.ucsc.edu/sites/default/files/technical-reports/UCSC-SOE-13-13.pdf

moralejo commented 3 years ago

I'd say the quantile method mentioned in the second publication has good chances of working well in our case.

maxnoe commented 3 years ago

@moralejo Thanks, indeed the quantile based approach looks promising.

moralejo commented 3 years ago

It solves your example exactly, hopefully will work with not-quite-but-close-enough-to-gaussian distributions. (shown are the cumulatives, not the pdfs)

moralejo commented 3 years ago

By the way, I think the "amplitude" interpolation is probably the adequate one for the effective area, while the quantile interpolation is adequate for the migration matrix and the PSF.

maxnoe commented 3 years ago

@moralejo yes, this is specifically about the interpolation of the PDF-like IRFs.

RuneDominik commented 3 years ago

Hi all, there seems to be another possible solution, see this paper, that does not need the integration/differentiation that the CDF approach uses, treats both horizontal and vertical morphing and is extendable to more then one hidden variables. It does solve the gaussian example exactly

Gaussians.pdf

and a first test with actual energy migration yields promising results (although the true migration at z=30 would be needed to assess this properly):

Interp_Energ_Disp.pdf

maxnoe commented 2 years ago

Hi, I think that besides interpolation we definitely need extrapolation, because one cannot build a grid that covers all possible cases inside the grid (and much less in an efficient way).

Ok, please let's collect requirements for the IRF inter/extrapolation. Otherwise, we will start implementing algorithm after algorithm that is not suitable because we forgot a requirement.

Currently on my list are:

Needs to work for arbitrary dimensions of dependent variables. At least 2D (zenith, azimuth or some transformation of them to make it more linear), but possibly three or more (think muon efficiency, LIDAR transmission, whatever else...)
Has to support extrapolation
We need both simple inter/extrapolation of the function (for effective area, parameters of the PSF functions) and of discretized PDFs (energy migration, PSF 2D).

For now, we will most likely only need point-like IRFs, so PSF is not as important, but it's essentially the same as energy migration (plus the per steradian part).

Anything I missed?

maxnoe commented 2 years ago

This might be helpful: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.Rbf.html and wouldn't require adding a new dependency

moralejo commented 2 years ago

But it does not mention extrapolation, right?

maxnoe commented 2 years ago

I found it by searching for "scipy ND extrapolation griddata", according to this SO answer, it can do extrapolation: https://stackoverflow.com/a/65151020/3838691

moralejo commented 2 years ago

Ah yes, seems to work in that test, funny that the docs do not mention it.

jsitarek commented 2 years ago

Hi,

since extrapolation is not explicitly mentioned in the function, it might be that this functionality is not really supported and the result can be imprecise. I played a bit with this function and it seems to me that once you switch from interpolation to extrapolation the accuracy degrades considerably (even for distances <~ grid size), please see the attached notebook Rbf_interpolation.ipynb.gz

moralejo commented 2 years ago

Hi @jsitarek, we will not need extrapolation over long distances, it is just to cover the edges of the phase space. In any case, I don't know in which sense a simple linear inter/extra-polation, i.e. just defining a plane with 3 (x,y,z) points and evaluating z in another point (x_target, y_target), can be said in general to be imprecise. The method itself works the same for inter- and extrapolation. Precision will depend on how good the assumption of linear behaviour is for the quantity in question, on how far apart the grid points are, and on how distant the target point is. None of those have to do with the method, which is pretty simple and could be implemented in few lines of code, right?

jsitarek commented 2 years ago

Hi @moralejo,

You know, if it is only the edges of the phase space it also does not take much more MC productions to have those edges produced as well :-). It a bit depends how far you mean by edges, but in this notebook that I was playing it the extrapolation was much less precise on the distance scales of the grid size. Of course I would also expect it to work worse, but the difference looked way to large for my taste. I tried a few different functions (multiplication of 2nd degree polynomials, but also linear functions), and it did not change much this conclusion. Please note that this RBF interpolation is not a simple linear interpolation that you mention, but something much more fancy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.RBFInterpolator.html#scipy.interpolate.RBFInterpolator

moralejo commented 2 years ago

You know, if it is only the edges of the phase space it also does not take much more MC productions to have those edges produced as well :-)

Well, the physical edge is convex, and we cannot go beyond the physical edge, unless we changed the observatory site or the B-field orientation in the MC simulations, which I am pretty sure it is something you would not deem sensible.

Please note that this RBF interpolation is not a simple linear interpolation that you mention, but something much more fancy:

Then it is not what we want. I think we just need linear inter/extrapolation, which, if not available, can be implemented quite easily.

jsitarek commented 2 years ago

If it is really going just a bit above the edge one could then even use the nearest neighbour "interpolation" that is available in the current implementation, I did not check but I think this one should work fine for extrapolation as well (one would just need to switch the method for those extrapolation cases, which can be done with one of the parameters of the interpolation function).

The linear interpolation is indeed easy (not necessarily the most precise however), but only if the points are in regular grid. The current implementaiton allows for arbitrary positions of the points and then the algorithms needs to find ND version of Delaunay triangles. For interpolation this already needs some special code. For extrapolation this is much more tricky, because you need to know which triangle is the "closest" one, and for arbitrary interpolation variables distance measured in different axis have different physical meaning.

maxnoe commented 2 years ago

Then it is not what we want. I think we just need linear inter/extrapolation, which, if not available, can be implemented quite easily.

I think we have clearly established, that we need something more than "plain simple and quite easy" at this point. If there is a well-tested, ready to use method that works, let's not try to implement a "simpler" method ourselves.

moralejo commented 2 years ago

If it is really going just a bit above the edge one could then even use the nearest neighbour "interpolation" that is available in the current implementation, I did not check but I think this one should work fine for extrapolation as well (one would just need to switch the method for those extrapolation cases, which can be done with one of the parameters of the interpolation function).

You may be in between two grid points, just outside the edge of the grid... obviously going for the nearest neighbour is less precise than interpolating those two grid points.

The linear interpolation is indeed easy (not necessarily the most precise however),

There is no guarantee that a more complicated method gives a better result, since that will depend on the interpolated quantity, i.e. on the physics.

but only if the points are in regular grid. The current implementaiton allows for arbitrary positions of the points and then the algorithms needs to find ND version of Delaunay triangles.

We define the grid, and we make it regular (not rectangular, though), so what we want is indeed to inter/extrapolate from a regular grid.

For interpolation this already needs some special code. For extrapolation this is much more tricky, because you need to know which triangle is the "closest" one,

On a regular triangular grid the 3 grid points closest to the target are those to be used, no complication.

and for arbitrary interpolation variables distance measured in different axis have different physical meaning.

Indeed, that is why we define a regular grid in parameters we have chosen, and such that the variation of the quantities of interest is similar in the steps in both axes...

moralejo commented 2 years ago

I think we have clearly established, that we need something more than "plain simple and quite easy" at this point.

Have we? Sorry, but then I am lost...

We decided on parameters (cos Zd and B_orthogonal) on which the performance should change more or less uniformly and not too fast, with the aim of facilitating the interpolation.

A triangular grid is the most efficient way of covering the space of interest with minimal number of points for a given average distance to the nearest grid points.

A linear interpolation is the simplest possible approach to get the values in an arbitrary target point.

Can a more complicated interpolation work better? Perhaps, but I don't think there can be one that will work better in general for all the quantities of our interest. It all depends on how the quantities vary, so perhaps we eventually want something more complicated for some of the quantities to be interpolated, but the starting point, and certainly an available option, should be the linear interpolation.

If there is a well-tested, ready to use method that works, let's not try to implement a "simpler" method ourselves.

Sure, I am all for using existing code. And certainly there is something for defining a plane with three points, and evaluating it at an arbitrary (x,y), inside or outside of the triangle they form.

RuneDominik commented 2 years ago

To push this discussion forward, after discussion with @chaimain we agreed that it would be desirable to have some kind of benchmark or measure, when an interpolation method is deemed "good enough". With #163 and #174 there are two methods that can in principle tackle the task at hand. Their results are of course different and #163 performs better in the testcase of the IRFs @chaimain kindly provided. #174 however scales better to nD interpolation variables.

A comparison can be found here: CurrentStatus.pdf

a version with hidden code cells (.html in a .zip, as github does not allow direct .html files): CurrentStatus.zip

moralejo commented 2 years ago

Thanks for reviving this @RuneDominik.

In the tests you link it would be good to see the uncertainties (of the "true" values at least, for interpolation it would need some calculation) to have an idea of what can be significant deviations and what not.

Indeed the simpler quantile interpolation seems to work better, and I don't see the better scaling to nD like a significant advantage of the other method: the problem seems to be in the "Delaunay Triangulation to find the simplex in which the target point lies" but we will design the grid of points (in principle just 2 coordinates, related to pointing direction), so the grid nodes to be used in each actual observation direction is something that could be calculated just once per MC library and retrieved from a LUT.

maxnoe commented 2 years ago

Can we check the two methods on a high-resolution, artificial example to exclude that there is just a bug in the implementation of the second method?

jsitarek commented 2 years ago

Hi @RuneDominik Thanks for the study and new implementation. There are still some considerable differences of the interpolated (the tail of the distributions) and true distributions, but the grid for this was very broad, so it is natural to have a large difference even with optimized method. We would know more with the ongoing production (at least the code can be tried when the Crab-path is completed.

RuneDominik commented 2 years ago

I've done some more testing using Gaussian distributions, the results are appended below, you can find them at the end of each file. The gist of these tests:

Cannot be handled: Strong, non-linear dependencies are problematic (e.g. having mean = m^2). This is not surprising, since both methods use a linear interpolation between the ppfs. In consequence, non-linear dependencies on two hidden variables (e.g. having mean = m1*m2) are also problematic. (Strongly) skewed distributions cannot be estimated well.

Can be handled: Linear dependencies in the distributions (e.g. having mean = m with hidden variable m) and the non-linear changes on sufficiently small scales (e.g. having mean = cos(m1)*cos(m2)). Linear combinations of hidden variables (e.g. having mean = m1 + m2)

The new methods seems to have slightly better results than the old one.

.pdf with visible code-cells: CurrentStatus_inclGaussianTests.pdf

.zip archive with .ipynb and .html, both with hidden code-cells to make it more readable: CurrentStatus_inclGaussianTests.zip

jsitarek commented 2 years ago

Hi @RuneDominik

Sorry for a very late feedback. Very thorough study. I had a bit deeper look in the files that you posted. When you test things for different scalings of sigma (cell 22)you write: "Methods perform comparable, new method again better." While in all the plots the new method is slightly broader, so if any slightly worse (but I would say that they are simply comparable) In the end you write "The new implementation seems to be slightly more stable." - I would say that in the plots the only place where the new method was better was with the skewed distribution, and even this was marginal. On the other hand, in your cell 17 the new method shows higher outliers in the bottom plot (best visible for the blue curve). In the end your original quantile interpolation method already works pretty good, and the adapted one does not seem to improve things, so why not simply stay with the original one?

maxnoe commented 5 months ago

I think with the different method implemented by @RuneDominik, we can close this and discuss any further things in new issues.

cta-observatory / pyirf

Necessary improvements to IRF interpolation (Handle PDFs properly, extrapolation) #161