bioFAM / MOFA

Multi-Omics Factor Analysis
GNU Lesser General Public License v3.0
231 stars 57 forks source link

Question on initialization #54

Closed BennyStrobes closed 4 years ago

BennyStrobes commented 4 years ago

Hi. I have a question on initialization of the model. The supplement of your paper says variables are initialized according to their priors. So for the case of the gamma distributed precision variables such as alpha_k^m (which is parameterized by alpha_0=beta_0=1e-14), most of the random samples following a gamma distribution with this parameterization will be zero. Making most of the initialized values of alphak^m equal to zero. Do you then randomly draw w{k,d}^m from a gaussian with infinite (1/0) variance?

rargelaguet commented 4 years ago

Hi Ben, We initialise the expectations of the variational distribution q(alpha) using the expectation of its prior distribution p(alpha). For a gamma distribution, this is alpha_0/beta_0 = 1). For the weights, there is no need to initialise them because we defined them to be the first update in the variational EM algorithm. But if one needs to do, they would be initialised by sampling from N(0,1/E[alpha]), so N(0,1).

Having said this, we had a discussion that perhaps setting alpha_0 and beta_0 to 1e-3 could be more appropriate. We did explore it but results did not change much. Hope this makes sense.

P.S. please consider switching to MOFA v2 (https://github.com/bioFAM/MOFA2). The implementation is much more readable if you want to dig into it. Here is file with the the initialisations: https://github.com/bioFAM/MOFA2/blob/master/mofapy2/build_model/init_model.py#L351

Best, Ricard.

rargelaguet commented 4 years ago

Also, in case you are working with matrix factorisation implementations, I want to mention that we obtain better ELBO and faster convergence when initialising the factors and the weights using a maximum likelihood estimate (or rather, just the PCA solution)