xjing76 commented 2 years ago

This PR will be adding Gibbs steps that would allow sampling for dispersion terms of Negative Binomial.

This PR will follow the compound poisson method mentioned here

So far we roughy added the Gibbs steps for

r -dispersion terms in N(r, p)
h - Inverse of scale parameter for gamma distribution of r
L_i - The number terms of L/N of a Poisson distributed variable in Compound Poisson distribution

The most tricky part so far comes form the sampling of L_i as it is a discrete sampling from a given probability vector where the probability vector comes from a pre-calculated matrix with some additional computation.

Next step is to add match functions for the NB graph with dispersion terms.

zoj613 commented 2 years ago

@xjing76 You might want to consider setting up pre-commit locally using the config file in this repo, Tests for the style check are currently failing.

rlouf commented 2 years ago

Great! Following what @zoj613 said: could you install pre-commit and make sure that /every commit/ at least passes code checks, and to not start on a new commit until the current one passes the tests as well?

codecov[bot] commented 2 years ago

Codecov Report

Merging #28 (e0d64ae) into main (5287dd8) will decrease coverage by 3.78%. The diff coverage is 83.63%.

:exclamation: Current head e0d64ae differs from pull request most recent head df88f78. Consider uploading reports for the commit df88f78 to get more accurate results

@@             Coverage Diff             @@
##              main      #28      +/-   ##
===========================================
- Coverage   100.00%   96.21%   -3.79%     
===========================================
  Files            4        2       -2     
  Lines          241      238       -3     
  Branches        19       13       -6     
===========================================
- Hits           241      229      -12     
- Misses           0        8       +8     
- Partials         0        1       +1

Impacted Files	Coverage Δ
aemcmc/gibbs.py	`95.23% <83.63%> (-4.77%)`	:arrow_down:
aemcmc/utils.py
aemcmc/conjugates.py

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5287dd8...df88f78. Read the comment docs.

xjing76 commented 2 years ago

I think I set up the main steps for the alpha gibbs sampler steps.

However, I think I need a simpler way to calculate the pre-calculated matrix F as the size of it is determined by the maximum of observations, and so the size of it can get quite large.
I am still looking for ways to make this srng.choice(range(N), size=1, replace=True, p=R_r(F, r, i)) for i in y work as p seems cannot be assigned as a rv.

rlouf commented 2 years ago

Great! For (2) could you provide a minimal example that illustrates the problem you're having so I can have a look?

xjing76 commented 2 years ago

Great! For (2) could you provide a minimal example that illustrates the problem you're having so I can have a look?

I think the problem itself is not what I thought it was initially. But the problem still persist.

Initially i set up the sampling step as

But it seems that I am not able to iterate through y (with a can't determine dimension error)

li = [srng.choice(at.arange(N), size=1, replace=True, p=R_r(F, r, i)) for i in y]

But base on the paper equation (23-25) we would need to sample l_i seperately for each new r and each y_i

Just to test out the technical feasibility of the set up, I did

li = srng.choice(at.arange(N), size=N, replace=True, p=R_r(F, r, 10))

This would not work as well as some step results p vector to be a vector full of nan

Any suggestions how should I debug this?

rlouf commented 2 years ago

Do you have a setup that allows you to work with a debugger such as pdb? You should then be able to go up the stack trace when said line fails and investigate where these NaNs come from.

xjing76 commented 2 years ago

Do you have a setup that allows you to work with a debugger such as pdb? You should then be able to go up the stack trace when said line fails and investigate where these NaNs come from.

I do have --pdb set up but the results are coming from graph op and there are more arguments than what we supply. I will take a further look into it. Thanks!

xjing76 commented 2 years ago

1 On making F and R_r matrix numerically stable. However I didn't use the stirling method that was mentioned in this paper here (lemma 2.1) as Stirling number can get overflow pretty easily at 100. I just added as switch state as in the paper that when r > 1 we would use the unweighted F instead of weighted matrix R_r

Since ChoiceRV in Aesara only take a vector as a p, I am coming up with a multichoiceRV here (which is not working yet) to temporarily solve this issue. But I am hoping that we might eventually adding a choiceRV in aesara that would be able to handle such problem.

xjing76 commented 2 years ago

So far, I think the engineering for Multichoice RV are done. The shape and unit test are passing without much issue. However, it seems like the sampling step itself is converging. Looking at the samples for R, it keep getting larger and larger while the true dispersion term is only 1. It would be the next step for me.

xjing76 commented 2 years ago

I updated the l_i sampling steps's parameters, to what is set in the matlab example and the sample results seems a bit more reasonable and behaving better. I think it may not be enough just to have a test on the shape of the samples but also on the values of those sample results.

However, the dispersion samples seems unreasonably large

array([[99372.56587824],
       [ 4997.51240255],
       [ 6943.54013679],
       ...,
       [ 6448.89043   ],
       [ 5958.37300573],
       [ 6246.96751815]])

xjing76 commented 2 years ago

By only looking at test_horseshoe_nbinom the betas sampled in those set up are not converging very well either. So I set up a separate branch with only given betas and allow only sampling for the dispersion parts.

However, it seems like the dispersion portion is still converging to a very large number than what we preset to (1). I attempt to fully covert the code to the implementation of matlab version yet i think still need some more time to figure out the model as the phi portion used in matlab is intertwined with the sampling of both beta and the dispersion r

https://github.com/xjing76/aemcmc/tree/fixed_beta

brandonwillard commented 2 years ago

I attempt to fully covert the code to the implementation of matlab version yet i think still need some more time to figure out the model as the phi portion used in matlab is intertwined with the sampling of both beta and the dispersion r

That implementation references M. Zhou and L. Carin, "Negative Binomial Process Count and Mixture Modeling," arXiv:1209.3442, Sept. 2012 and M. Zhou and L. Carin, "Augment-and-Conquer Negative Binomial Processes," in NIPS 2012. The former uses the CRTP distribution formulation for L_i, instead of the computations you're using from Zhou M, Li L, Dunson D, Carin L. "Lognormal and Gamma Mixed Negative Binomial Regression", and the latter states that the CRTP distribution is a sum of CRT variates, and that CRT variates can be drawn as a sum of Bernoullis.

The CRTP approach looks much simpler, as the reference implementation demonstrates, so we should go with that one.

xjing76 commented 2 years ago

Here I finally updated this branch with similar set up as with the matlab implementation.

1 I set up a CRT_sum and the sampling step for dispersion term "r" almost exactly like that of in the matlab implementation. However, I kept the sampling step for beta the same way as it was before.

2 Here is still quite some issues with the current set up.

the steps do quite badly with any 0 impressions. I had to tune the toy data parameter to make sure all y are above zero.
The sampling of r are making huge jumps between very large value e6 and very small values e-5, which is also one of the reason why the polygamma sampling is problematic with impressions have zeros in them, as the h parameter, composed of y+ r.
Technical wise the sampling step is doing okay with good speed and should be bug free however, the convergence is till not very ideal

brandonwillard commented 2 years ago

Don't forget to resolve the open conversations once they've been addressed.

brandonwillard commented 2 years ago

The current failure is due to the problem addressed in https://github.com/aesara-devs/aesara/pull/1003.

aesara-devs / aemcmc

Add a Gibbs sampler for the negative-binomial dispersion parameter #28

Codecov Report