Open fmkroci opened 3 months ago
Some additional remarks:
The interpolation range can be extended to lower values without issues, thereby reducing issues with negative values.
Maybe this could be combined with a clip (maybe once the gradient becomes numerically small?)
op = _InterpolationOperator(self._domain, lambda x: gamma.ppf(sp_norm._cdf(x), self._alpha), -50., 8.2, self._delta)
For large positive values of the latent space the extrapolation behaves quite different for different choices of delta:
This is a scan using the GammaOperator
This is the same scan using the InterpolationOperator extended to the left:
Thank you for your interest in NIFTy and for your bug report!
I think the current implementation of the inverse gamma operator in NIFTy should never produce values smaller than zero by default. In theory, the interpolation should be run in log-space and the result should be exponentiated afterwards. Are you using a custom-built inverse gamma operator? Can you provide a reproducing example of where the inverse gamma operator becomes negative?
I used the following as a quick validation:
n = 100_000
spc = ift.UnstructuredDomain(n)
ig = ift.InverseGammaOperator(spc, 1.0, 1.0)
assert np.any(ig(ift.makeField(spc, np.linspace(-10.0, 10.0, n))).val >= 0)
Hey thank you for your answer. I am looking at the GammaOperator not the InverseGammaOperator.
I think the main reason for the dramatic difference in the extrapolation behavior is connect to the cubic spline interpolation. It should be easy enough to extend the interpolation to smaller values and combine it with a clipping since gamma.ppf(sp_norm._cdf(x) is stable for smaller values. For larger values the problem is more complicated and it would require some approximation of the tail of gamma.ppf(sp_norm._cdf(x) (which should work for parameter that make the gamma distribution close to a normal) I have a test implementation ready that seems to work for the most part. However, the tail approximation produces a discontinuity when switching over from the interpolation.
Here is an example to reproduce the problem (please note it's the legacy version of NIFTy and the result might change for a different interpolation method)
import nifty8 as ift
import numpy as np
import matplotlib.pyplot as plt
from nifty8.library.special_distributions import GammaOperator
# helper to automatically create the domain -> default gamma operator
def UnnamedGammaTransform(mean, sigma, N_copies=0, delta=1e-2):
if N_copies < 0:
raise Exception
if N_copies == 0:
domain = ift.DomainTuple.scalar_domain()
else:
domain = ift.UnstructuredDomain(N_copies)
return GammaOperator(domain=domain, mean=mean, var=sigma**2, delta=delta)
x_latent_eval = np.arange(-30, 0.0, 0.01)
sigma = 0.1
mean = 0.350
for delta in [1e-1, 1e-2, 1e-3, 1e-4, 1e-5]:
op = UnnamedGammaTransform(sigma=sigma, mean=mean, delta=delta)
a = np.array([op(ift.makeField(op.domain,x)).val for x in x_latent_eval])
plt.plot(x_latent_eval, a, label=delta)
plt.title('GammaOperator Stability')
plt.xlabel('x latent')
plt.ylabel('value of gamma op')
plt.legend()
plt.show()
Should produce this:
I misread your bug report. The gamma operator can indeed produce zeros even though it shouldn't do that. We use the inverse gamma operator a lot more than the gamma operator and my head auto-corrected gamma to inverse gamma. Sorry about that! :grimacing:
Clips can be dangerous if the model at least sometimes goes into that regime. We use gradients and curvatures a lot and when you clip somewhere that is not within sampling or within the line searcher of the NCG, then you might run into problems later on as the parameter is never changed again. What do you think about doing the interpolation for gamma also on in log-space? This way, we would ensure strict positivity without zero gradients.
Hi thanks a lot for the insights!
I can have a look at the InverseGammaOperator to check if I can transfer the log-based interpolation. Seems like a useful approach! Any ideas how to handle the instabilities for the large values?
I wanted to use a gamma distribution as a prior for strictly positive quantities in a parametric part of my model. As it has a lighter tail than both InverseGamma and Lognormal this avoids situation during the optimization where parameters drift off to large values (btw. I think it would also be a useful default prior on the flexibility and asperity parameters of the correlated field model for that reason)
This is also the reason why these extreme values matter in the first place. If the likelihood is very informative then it can move the posterior far away from the prior. However, this will probably affect the values closer to zero more than the values far above the mean so maybe this is an non-issue.
The log-space interpolation should be trivial to implement using the table functions feature of the interpolator. Actually, I quickly adapted the gamma operator accordingly. Can you check whether the branch https://github.com/NIFTy-PPL/NIFTy/tree/gamma_in_log_space resolves the issues you are encountering?
I think the way to proceed regarding any further instabilities is to find asymptotic approximations. I would be happy to review and merge a PR that implements these!
The log-normal distribution has some nice theoretical guarantees if one follows the maximum entropy principle (maximally open to changes but still enforcing positivity and a given mean and std) which, I think, we want to keep moving forward. For most models the prior is very loose and the posterior is fully within the prior volume so the model never has to go to latent values larger than say 6 or so (6 std. away is already ridiculously far away!). The art is to find the right parametrization that is sufficiently flexible yet not too unconstraining at the same time...
Hey thanks for the update! I will have a look at your branch tomorrow. I also adapted my test code and it seems to work perfectly for small values. Thanks a lot! There is an asymptotic approximation using the Cornish-Fisher expansion for the tail of the gamma distribution. I tried it in a similar way to the log-space interpolation using a JaxOperator. It is then quite easy to resolve both cases at the same time. I will provide you with an update once I have worked out if the expansion is correct.
The log-space interpolation should be trivial to implement using the table functions feature of the interpolator. Actually, I quickly adapted the gamma operator accordingly. Can you check whether the branch https://github.com/NIFTy-PPL/NIFTy/tree/gamma_in_log_space resolves the issues you are encountering?
I think the way to proceed regarding any further instabilities is to find asymptotic approximations. I would be happy to review and merge a PR that implements these!
The log-normal distribution has some nice theoretical guarantees if one follows the maximum entropy principle (maximally open to changes but still enforcing positivity and a given mean and std) which, I think, we want to keep moving forward. For most models the prior is very loose and the posterior is fully within the prior volume so the model never has to go to latent values larger than say 6 or so (6 std. away is already ridiculously far away!). The art is to find the right parametrization that is sufficiently flexible yet not too unconstraining at the same time...
This fixes the issue for values that are close to zero. I will write down the expansion for the tail of the gamma distribution to confirm that my current code works correctly. (it seems so at the moment and I can evaluate the latent space from -500 to 500 without issues!
I have to use a JaxOperator to switch between the two interpolations and can therefore only use a linear interpolation. If you are interested I could submit a PR.
I would be happy to review a PR adjusting the tails for the gamma distribution for NIFTy.re (the new JAX flavor of NIFTy) but I think we should not introduce a hard JAX dependency via a JaxOperator in the gamma distribution in the "numpy" NIFTy code.
Hi,
I will have a look at the legacy implementation, when I can make some time. (Sorry had some other urgent things to implement). If you want I could push the current changes to a branch on a fork of NIFTy. This way the different implementation would be available in principle.
For our project we still use the old implementation of NIFTy but we would like to switch to nifty.re once we have the time to update our code (this might take some time)
No worries. Given that you are not blocked by this, I do not believe this issue needs to be urgently solved.
I doubt that a dangling branch will be generally useful. Let's focus on doing it properly via a PR for NIFTy and NIFTy.re once you have the time :)
Thanks a lot for the help! I will update you once I have the fixes ready.
I noticed that due to the interpolation of the inverse cdf (actually of gamma.ppf(norm.cdf(x)) ) the application of the GammaOperator may return negative values if evaluated more than ca. 8 sigma away from 0 (in the latent space).
The interpolation you use is cubic and based on a table from -8.2 to 8.2 (I assume it is this range since the evaluation of the gamma.ppf(norm.cdf(x)) is not very stable beyond that point). Once you evaluate the operator beyond those limits the extrapolation of the cubic spline seems to cause issues.
I'm not sure if this qualifies as a bug since it quite far away from zero. However, I had quite some unlucky start values for my fit that causes some intermediate negative values during the fit.
One option would be to clip the values. But I wonder if there would be some simple asymptotic functions that could be used to extend the range.
Short summary: Expected behavior: Only returns values great than 0 Observed behavior: Numerical approximation allows values smaller than 0