Closed xjing76 closed 2 years ago
@xjing76 You might want to consider setting up pre-commit
locally using the config file in this repo, Tests for the style check are currently failing.
Great! Following what @zoj613 said: could you install pre-commit
and make sure that /every commit/ at least passes code checks, and to not start on a new commit until the current one passes the tests as well?
Merging #28 (e0d64ae) into main (5287dd8) will decrease coverage by
3.78%
. The diff coverage is83.63%
.:exclamation: Current head e0d64ae differs from pull request most recent head df88f78. Consider uploading reports for the commit df88f78 to get more accurate results
@@ Coverage Diff @@
## main #28 +/- ##
===========================================
- Coverage 100.00% 96.21% -3.79%
===========================================
Files 4 2 -2
Lines 241 238 -3
Branches 19 13 -6
===========================================
- Hits 241 229 -12
- Misses 0 8 +8
- Partials 0 1 +1
Impacted Files | Coverage Δ | |
---|---|---|
aemcmc/gibbs.py | 95.23% <83.63%> (-4.77%) |
:arrow_down: |
aemcmc/utils.py | ||
aemcmc/conjugates.py |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5287dd8...df88f78. Read the comment docs.
I think I set up the main steps for the alpha gibbs sampler steps.
However, I think I need a simpler way to calculate the pre-calculated matrix F as the size of it is determined by the maximum of observations, and so the size of it can get quite large.
I am still looking for ways to make this srng.choice(range(N), size=1, replace=True, p=R_r(F, r, i)) for i in y
work as p seems cannot be assigned as a rv
.
Great! For (2) could you provide a minimal example that illustrates the problem you're having so I can have a look?
Great! For (2) could you provide a minimal example that illustrates the problem you're having so I can have a look?
I think the problem itself is not what I thought it was initially. But the problem still persist.
But it seems that I am not able to iterate through y (with a can't determine dimension
error)
li = [srng.choice(at.arange(N), size=1, replace=True, p=R_r(F, r, i)) for i in y]
But base on the paper equation (23-25) we would need to sample l_i
seperately for each new r
and each y_i
li = srng.choice(at.arange(N), size=N, replace=True, p=R_r(F, r, 10))
This would not work as well as some step results p vector to be a vector full of nan
Any suggestions how should I debug this?
Do you have a setup that allows you to work with a debugger such as pdb? You should then be able to go up the stack trace when said line fails and investigate where these NaN
s come from.
Do you have a setup that allows you to work with a debugger such as pdb? You should then be able to go up the stack trace when said line fails and investigate where these
NaN
s come from.
I do have --pdb
set up but the results are coming from graph op and there are more arguments than what we supply. I will take a further look into it. Thanks!
1 On making F
and R_r
matrix numerically stable. However I didn't use the stirling method that was mentioned in this paper here (lemma 2.1) as Stirling number can get overflow pretty easily at 100. I just added as switch state as in the paper that when r > 1 we would use the unweighted F
instead of weighted matrix R_r
ChoiceRV
in Aesara only take a vector as a p, I am coming up with a multichoiceRV
here (which is not working yet) to temporarily solve this issue. But I am hoping that we might eventually adding a choiceRV
in aesara that would be able to handle such problem. So far, I think the engineering for Multichoice RV are done. The shape and unit test are passing without much issue. However, it seems like the sampling step itself is converging. Looking at the samples for R, it keep getting larger and larger while the true dispersion term is only 1. It would be the next step for me.
I updated the l_i sampling
steps's parameters, to what is set in the matlab example and the sample results seems a bit more reasonable and behaving better.
I think it may not be enough just to have a test on the shape of the samples but also on the values of those sample results.
However, the dispersion samples seems unreasonably large
array([[99372.56587824],
[ 4997.51240255],
[ 6943.54013679],
...,
[ 6448.89043 ],
[ 5958.37300573],
[ 6246.96751815]])
By only looking at test_horseshoe_nbinom
the betas sampled in those set up are not converging very well either. So I set up a separate branch with only given betas and allow only sampling for the dispersion parts.
However, it seems like the dispersion portion is still converging to a very large number than what we preset to (1). I attempt to fully covert the code to the implementation of matlab version yet i think still need some more time to figure out the model as the phi
portion used in matlab is intertwined with the sampling of both beta and the dispersion r
I attempt to fully covert the code to the implementation of matlab version yet i think still need some more time to figure out the model as the
phi
portion used in matlab is intertwined with the sampling of both beta and the dispersionr
That implementation references M. Zhou and L. Carin, "Negative Binomial Process Count and Mixture Modeling," arXiv:1209.3442, Sept. 2012 and M. Zhou and L. Carin, "Augment-and-Conquer Negative Binomial Processes," in NIPS 2012. The former uses the CRTP distribution formulation for L_i
, instead of the computations you're using from Zhou M, Li L, Dunson D, Carin L. "Lognormal and Gamma Mixed Negative Binomial Regression", and the latter states that the CRTP distribution is a sum of CRT variates, and that CRT variates can be drawn as a sum of Bernoullis.
The CRTP approach looks much simpler, as the reference implementation demonstrates, so we should go with that one.
Here I finally updated this branch with similar set up as with the matlab implementation.
1 I set up a CRT_sum and the sampling step for dispersion term "r" almost exactly like that of in the matlab implementation. However, I kept the sampling step for beta
the same way as it was before.
2 Here is still quite some issues with the current set up.
r
are making huge jumps between very large value e6 and very small values e-5, which is also one of the reason why the polygamma sampling is problematic with impressions have zeros in them, as the h
parameter, composed of y+ r
. Don't forget to resolve the open conversations once they've been addressed.
The current failure is due to the problem addressed in https://github.com/aesara-devs/aesara/pull/1003.
This PR will be adding Gibbs steps that would allow sampling for dispersion terms of Negative Binomial.
This PR will follow the compound poisson method mentioned here
So far we roughy added the Gibbs steps for
The most tricky part so far comes form the sampling of L_i as it is a discrete sampling from a given probability vector where the probability vector comes from a pre-calculated matrix with some additional computation.
Next step is to add
match
functions for the NB graph with dispersion terms.