Photrek / Nonlinear-Statistical-Coupling

Apache License 2.0
5 stars 1 forks source link

CoupledNormal sampling inconsistent with Student's t #27

Open hxyue1 opened 3 years ago

hxyue1 commented 3 years ago

I was looking through the CoupledNormal class and was curious about the sampling method. I did some testing to compare it with the Student's t distribution and it looks like they're not consistent.

From my understanding, a coupled normal has the same distribution as a student's t with df = 1/kappa, provided loc and scale are the same. Empirically when I sample from the CoupledNormal and the numpy implementation of the student's t, the results are inconsistent, and in some cases wildly different.

sample_size = 1000
kappa = 0.4
loc = 0.
scale = 1.

np.random.seed(0)
cn = CoupledNormal(loc=loc, scale=scale, kappa=kappa, alpha=2)
cn_samples = cn.sample_n(n=sample_size)
t_samples = t.rvs(df=1/kappa, loc=loc, scale=scale, size=sample_size)

fig, ax = plt.subplots(figsize=(8, 5))
plt.hist(cn_samples, label='coupled normal', bins=30, alpha=0.5)
plt.hist(t_samples, label='students-t', bins=30, alpha=0.5)
plt.legend()

image

It seems to be particularly bad when scale > 1 and kappa is small

sample_size = 1000
kappa = 0.1
loc = 0.
scale = 10.

np.random.seed(0)
cn = CoupledNormal(loc=loc, scale=scale, kappa=kappa, alpha=2)
cn_samples = cn.sample_n(n=sample_size)
t_samples = t.rvs(df=1/kappa, loc=loc, scale=scale, size=sample_size)

fig, ax = plt.subplots(figsize=(8, 5))
plt.hist(cn_samples, label='coupled normal', bins=30, alpha=0.5)
plt.hist(t_samples, label='students-t', bins=30, alpha=0.5)
plt.legend()

image

kenricnelson commented 3 years ago

Thanks for taking a close look at the Coupled Distribution sampler @hxyue1 . @Kevin-Chen0 and @jkclem my view is that at the distribution level the library should be closely compatible with the TensorFlow Probability library. If possible, I recommend copying the Student's T sampler from TFP; otherwise, possibly copying the sampler from Numpy. I'm concerned that the subtleties of sampling from a heavy-tail distribution are difficult to build from scratch and even more difficult to test adequately. In our meeting last week @jkclem mentioned that he was only evaluating the sample for small values of kappa; however, I'm not sure it makes sense to spend development time on a custom solution that we don't know upfront will be reliable for the full range of possible distributions.

Kevin-Chen0 commented 3 years ago

Hey Kenric,

@jkclem is almost done with the sampler method for the MultivariateCoupledNormal (MCN) class. Let's let him finish with his current version and then test it with @hxyue1 code above. If this new version is also not accurate, then we can consider just using TFP's sampler code instead. Right now, MultivariateCoupledNormal takes a much higher precedent and I wouldn't worry too much about the functions for CoupledNormal.

kenricnelson commented 3 years ago

Yes, let's review and discuss the MCN. @hxyue1 perhaps you want to take a look at John's new version when he has it ready

hxyue1 commented 3 years ago

Yes I can definitely do that!

hxyue1 commented 3 years ago

While I was working on the tensorflow integration for the CoupledNormal class, I fixed the sample_n method. The new code is in the file CoupledNormal_tf on the tensorflow branch.

I'll need to check with @jkclem on how it has been implemented in the MultivariateCoupledNormal but for now I think this should suffice.

jkclem commented 3 years ago

The sampling function should be fixed now for both the CoupledNormal and MultivariateCoupledNormal classes.