Mix of shape and scale in bayes.vb.Gamma

BAMresearch / bayem

Implementation and derivation of "Variational Bayesian inference for a nonlinear forward model." [Chappell et al. 2008] for arbitrary, user-defined model errors.

MIT License

2 stars 1 forks source link

Mix of shape and scale in bayes.vb.Gamma #16

Closed TTitscher closed 2 years ago

TTitscher commented 3 years ago

@aradermacher and @joergfunger figured out a bug in bayes.vb.Gamma where we (most likely I :smile: ) mixed up the shape (should be c) and the scale (should be s) parameter.

Should we test that or just fix it?

Edit: I may have thought scale has a c in it and accidentally connected them this way. Should we maybe adapt the k, theta notation from wikipedia?

aradermacher commented 3 years ago

I would prefer to use "scale" and "shape" instead of abbreviations in the definition of Gamma and s/c in the algorithm because that's conform to the paper!

ajafarihub commented 3 years ago

So finally, what would be a proper choice (shape, scale) for a non-informative gamma distribution (which in the VB, represents the precision of the noise)? Maybe we should only select a proper "shape" for the general case, since "scale" is just a scale and can be chosen according to the magnitude of an individual entry of forward-model output.

joergfunger commented 3 years ago

Setting the shape (e.g. to 1 to have an exponential distribution) is what we tried (at least that was the idea). However, I think the challenge is howto choose the scale - if we model the precision of the noise the true value should always be included in the pdf - and be careful, since the precision is increasing for lower std.

ajafarihub commented 3 years ago

Thanks. I actually think the challenge is a bit beyond, since we have no concrete idea about the precision of true noise for a real data-set. We can only have an estimate of a range for it. So, one approach would be: Suppose:

max_fw_group: an estimate of maximum magnitude of one group of forward-model output , then we can say for example:
min_std_noise = 0.001 * max_fw_group # quite small
max_std_noise = 0.5 * max_fw_group # quite big (conservative) , which leads to:
max_precision_noise = 1 / (min_std_noise ^2)
min_precision_noise = 1 / (max_std_noise ^2)

Now the question is how to set a properly-shaped gamma distribution, such that for example: percentile( 5% ) = min_precision_noise percentile(95% ) = max_precision_noise

ajafarihub commented 3 years ago

Someone has provided a nice python code to extract this info: https://www.codeproject.com/Articles/56371/Finding-Probability-Distribution-Parameters-from-P I am going to use it.

joergfunger commented 3 years ago

The procedure you suggest seems useful. Even tough I would argue that we often should have some knowledge on the noise term - otherwise the Bayesian procedure should be questioned at all. If you have no idea, then just use a frequentist approach (the essential feature of Bayesian is actually to include prior - be it even very sparse e.g. knowing that some parameter such as the noise std is always positive).

TTitscher commented 3 years ago

Jörg suggested to remove the constructor Gamma.FromSD(some_sd, shape=1) as it can hide the fact that it is actually built from a scape and a scale parameter.

A more explicit implementation could look like

shape, scale = GammaHelper.from_sd(42.)
noise_prior = Gamma(shape, scale)

and people (or at least I) will then use it as the one-liner

noise_prior = Gamma(*GammaHelper.from_sd(42.))

which also hides the shape and the scale.

So I would argue that we cannot prevent people to make errors or to use stuff, they to not understand. And, as this GammaHelper.from_sd would only be used to actually construct a Gamma, I would keep it as is. But I have no strong opinion about that and a change would be trivial. What do you think?

ajafarihub commented 3 years ago

This separation is nice:

first, get standard gamma parameters
second, get the gamma distribution itself It could make more sense IMO when we define the method 'from_sd' with two arguments (lower and upper sd):
```
shape, scale = GammaHelper.from_sd(lower_sd, upper_sd, percentile=0.05)
```
This is what I implemented here. Note that, the bound_scales and target_stds are finally leading to lower_sd and upper_sd .

joergfunger commented 3 years ago

I would also think that the separation is more straighforward, and maybe Abbas can then add his approach as an alternative.

TTitscher commented 3 years ago

I swapped shape and scale here, but there still is no real test. How to proceed here? Who is willing to invest some thoughts into a test case?

TTitscher commented 3 years ago

We try to derive a test case based on an analytical model #25