bayesmix-dev / bayesmix

Flexible Bayesian nonparametric mixture models in C++
https://bayesmix.rtfd.io
BSD 3-Clause "New" or "Revised" License
22 stars 18 forks source link

Woodbury Identity for likelihood evaluation FA #122

Closed giacomodecarlo closed 2 years ago

giacomodecarlo commented 2 years ago

Good afternoon everyone, I just changed the code to get a much faster likelihood evaluation as discussed in this post.

I attach a small pdf describing what is now kept in the state. Woodbury.pdf The speedup with p=500 (number of features) q=5 (number of latent factors) is about 100x, and it grows as p grows. I still need to double check the computations in the code, but it seems to be working already.

mberaha commented 2 years ago

Thanks @giacomodecarlo! The code looks good to me. I would generalise it a little further and add the following functions in the distribution.h file

// here wood_factor is L^{-1}  *Lambda^T * Psi^{-1}
double multi_normal_lpdf_woodbury_chol(Eigen::VectorXd y, Eigen::VectorXd mean, Eigen::VectorXd sigma_diag, Eigen::MatrixXd wood_factor) {
  double exp =
       -0.5 * ((datum.transpose() - state.mu)
                   .dot(state.psi_inverse * (datum.transpose() - state.mu)) -
               (wood_factor * (datum.transpose() - state.mu)).squaredNorm());

   double base = -0.5 * state.cov_logdet + NEG_LOG_SQRT_TWO_PI * dim;
   return base + exp;
}

double multi_normal_lpdf_woodbury(Eigen::VectorXd y, Eigen::VectorXd sigma_diag, Eigen::MatrixXd lambda) {
  Eigen::MatrixXd wood_chol = ...
  return multi_normal_lpdf_woodbury_chol(...);
}

then it becomes very easy to test that multi_normal_lpdf_woodbury returns the same value as stan::math::multi_normal_lpdf

One last question: is your implementation scaling linearly in p? I know that naively one would still scale quadratically in p

giacomodecarlo commented 2 years ago

I would say it is scaling linearly, but it would require a better complexity analysis, I don't think there is a term that clearly dominates complexity, my guess would be something like O(pq^2n*g), not sure about n and g since now likelihood evaluation is fast. Empirically I can say that with p=5000 it takes about 10 times as much with respect to p=500. Using the parametrization of cov_wood in the code we have no (pxq)(qxp) matrix multiplication (only the diagonal one which is clearly faster), so p^2 should never appear in the complexity, but again I should think about it more carefully to check other parts of the algorithm.

I'm now going to implement these functions and test them against the stan implementation.

giacomodecarlo commented 2 years ago

I implemented a test to check multi_normal_lpdf_woodbury is equal to stan::math::multi_normal_lpdf and also manually checked from inside the likelihood evaluation in the hierarchy with some prints that the function is working properly.

giacomodecarlo commented 2 years ago

Just one last minor commend for the documentation, then it's good to go!

I just added the comment, thanks!

mberaha commented 2 years ago

Thanks @giacomodecarlo for the great effort! Very nice contribution!