Closed taguhiM closed 2 years ago
To optimize the time spent on the computation of V_n(t)-s we are thinking of saving new values (for new n) into (possibly sparse) lookup tables. Moreover, some parallelization strategy might be used for the computations. This ideas are still under development.
To optimize the time spent on the computation of V_n(t)-s we are thinking of saving new values (for new n) into (possibly sparse) lookup tables. Moreover, some parallelization strategy might be used for the computations. This ideas are still under development.
Good! Let me add a two comments on this. 1- What you're looking for could be a "memoizer", just look up some examples online. 2- Before starting to optimise stuff, you should better understand the impact that this specific function has on the overall cost of the algorithm. You should profile the code (there are tons of c++ profilers out there, I know of this one https://baptiste-wicht.com/posts/2011/09/profile-c-application-with-callgrind-kcachegrind.html but it's probably outdated and you can find something better online)
Hi, we tried to print the execution time of the function computing the V_n's. It seems to be quite fast (at most 65 microseconds, on average more or less 30 on galaxy data) so it shouldn't significantly impact on the time of the algorithm. We also compared it with the running time of the DP and they performed similarly.
At the moment we are using the Poisson distribution as a prior on the number of components (as suggested in the paper). Should we add more distributions (e.g. negative binomial)?
Apart from that we think the code is ready to be reviewed, let us know what you think about it.
Galaxy_NNIG_MFM_06-02-2022_23:10:24 local Galaxy_LapNIG_MFM_06-02-2022_23:07:02 local
These are some plots on the Galaxy dataset using the new mixture with NNIG and LapNIG hierarchies correspondingly.
Thanks for the update, can you also upload the distribution of the number of clusters and its traceplot?
Hi, we tried to print the execution time of the function computing the V_n's. It seems to be quite fast (at most 65 microseconds, on average more or less 30 on galaxy data) so it shouldn't significantly impact on the time of the algorithm. We also compared it with the running time of the DP and they performed similarly.
Can you give us some more details on this? Specifically, what is the total runtime of the MCMC (consider for instance 10k iterations in total) with the DP and with the MFMixing?
At the moment we are using the Poisson distribution as a prior on the number of components (as suggested in the paper). Should we add more distributions (e.g. negative binomial)?
I don't think it's super necessary at the moment. Also, I think some other guys are implementing a similar mixing but in the context of a conditional algorithm, see https://github.com/edoardopalli/bayesmix At some point, it might be worth to join the efforts
For 10k iterations on the Galaxy dataset these are the timings:
[3.627s ]Running Neal8 algorithm (m=3 aux. blocks) with NNIG hierarchies, DP mixing... [9.036s] Computing log-density...
[3.518s] Running Neal8 algorithm (m=3 aux. blocks) with NNIG hierarchies, MFM mixing... [10.198s] Computing log-density...
[3.646s] Running Neal8 algorithm (m=3 aux. blocks) with LapNIG hierarchies, DP mixing... [10.631s] Computing log-density...
[4.44ss] Running Neal8 algorithm (m=3 aux. blocks) with LapNIG hierarchies, MFM mixing... [13.015s] Computing log-density...
And these are the plots if we understood correctly what you wanted Galaxy_LapNIG_DP_08-02-2022_23:04:48 local Galaxy_LapNIG_MFM_08-02-2022_23:07:14 local Galaxy_NNIG_DP_08-02-2022_23:00:19 local Galaxy_NNIG_MFM_08-02-2022_22:58:57 local .
For 10k iterations on the Galaxy dataset these are the timings:
[3.627s ]Running Neal8 algorithm (m=3 aux. blocks) with NNIG hierarchies, DP mixing... [9.036s] Computing log-density...
[3.518s] Running Neal8 algorithm (m=3 aux. blocks) with NNIG hierarchies, MFM mixing... [10.198s] Computing log-density...
[3.646s] Running Neal8 algorithm (m=3 aux. blocks) with LapNIG hierarchies, DP mixing... [10.631s] Computing log-density...
[4.44ss] Running Neal8 algorithm (m=3 aux. blocks) with LapNIG hierarchies, MFM mixing... [13.015s] Computing log-density...
Ok then the increase in computational time with NNIG is not so negligible (25%), I would try to work on implementing the lookup tables since it might speed up considerably the execution. Can you also report what happens in case of NNIG when you use Neal3?
Good morning, in the current implementation the values of V are computed only when needed and stored in a vector, so that they will not be computed again the next time the algorithm needs them. In the paper they report a recursive formula to compute values of V from previous ones (in terms of t -number of cluster- and n -size of the dataset-); our initial idea was to store the precomputed values in a lookup table (in a separate file) and use the recursive formula to compute the values of V we need starting from values stored in the table. Does it work or can you point us towards a better implementation?
Good morning, in the current implementation the values of V are computed only when needed and stored in a vector, so that they will not be computed again the next time the algorithm needs them. In the paper they report a recursive formula to compute values of V from previous ones (in terms of t -number of cluster- and n -size of the dataset-); our initial idea was to store the precomputed values in a lookup table (in a separate file) and use the recursive formula to compute the values of V we need starting from values stored in the table. Does it work or can you point us towards a better implementation?
I would use the recursive formula while storing the previously computed values in the vector, that should give us the most efficient implementation!!
In the recursive formula reported in the paper WhatsApp Image 2022-02-09 at 17 17 15 to compute V_n(t+1) we need both Vn(t) and V(n-1)(t), so in order to use this formula we will need to compute the values V_0(y) for each y <= t and V_m(0) for m = 1 ... n. Moreover we have experienced underflow problems with values of V as n and t increase, thus we introduced that constant C that we already talked about, in this case that approach can't be preserved. Any suggestions on how to deal with that?
We also ran 10000 iterations 10 times for each case and averaged the execution times of the algorithm to have a better estimate of the time.
Galaxy Data: NNIG_MFM_Neal8: 1.4969 s NNIG_DP_Neal8: 1.3047 s LapNIG_MFM_Neal8: 1.7887 s LapNIG_DP_Neal8: 1.5524 s
Tutorial Data: NNIG_MFM_Neal8_tutorial: 2.9813 s NNIG_DP_Neal8_tutorial: 2.864 s LapNIG_MFM_Neal8_tutorial: 3.2979 s LapNIG_DP_Neal8_tutorial: 3.473 s
We also ran 10000 iterations 10 times for each case and averaged the execution times of the algorithm to have a better estimate of the time.
Galaxy Data: NNIG_MFM_Neal8: 1.4969 s NNIG_DP_Neal8: 1.3047 s LapNIG_MFM_Neal8: 1.7887 s LapNIG_DP_Neal8: 1.5524 s
Tutorial Data: NNIG_MFM_Neal8_tutorial: 2.9813 s NNIG_DP_Neal8_tutorial: 2.864 s LapNIG_MFM_Neal8_tutorial: 3.2979 s LapNIG_DP_Neal8_tutorial: 3.473 s
Ok then it's fine for me, let's wait for the other reviews. I'm happy with this at the moment
Hi again, sorry to bother. You should make one last commit to trigger the test suite
Great, if you've addressed all of Bruno's comments, merge at will
This is the initial implementation of the Mixture of Finite Mixtures (MFM) model wrt the paper "Mixture Models with a prior on the Number of Components" by Miller, Harrison. The code works even though it is not efficient yet.