deruncie / MegaLMM

R package for fitting high-dimensional multivariate linear mixed effect models
https://deruncie.github.io/MegaLMM/
MIT License
35 stars 14 forks source link

MCMC convergence for latent factor parameters. #2

Open tpbilton opened 4 years ago

tpbilton commented 4 years ago

Hi,

I want to say a big thanks for putting in the effort of developing this R package. I quite like the approach you have taken to fit the multivariate LMM.

I have a query regarding convergence of the MCMC chains. I ran the code in the rmd file in the vignette but changed lines 172-187 to

n_iter = 1000;  # how many samples to collect at once?
for(i  in 1:70) {
  print(sprintf('Run %d',i))
  MegaLMM_state = sample_MegaLMM(MegaLMM_state,n_iter)  # run MCMC chain n_samples iterations. grainSize is a paramter for parallelization (smaller = more parallelization)

  MegaLMM_state = save_posterior_chunk(MegaLMM_state)  # save any accumulated posterior samples in the database to release memory
  print(MegaLMM_state) # print status of current chain
  plot(MegaLMM_state) # make some diagnostic plots. These are saved in a pdf booklet: diagnostic_plots.pdf

  # set of commands to run during burn-in period to help chain converge
  if(MegaLMM_state$current_state$nrun < MegaLMM_state$run_parameters$burn || i < 20) {
    MegaLMM_state = reorder_factors(MegaLMM_state,drop_cor_threshold = 0.6) # Factor order doesn't "mix" well in the MCMC. We can help it by manually re-ordering from biggest to smallest
    MegaLMM_state = clear_Posterior(MegaLMM_state)
    print(MegaLMM_state$run_parameters$burn)
  }
}

The diagnostic plots from the plot function are here. I mainly want to focus on the last page of the pdf. Looking at the code, I'm guessing that the different colored lines correspond to the five traits (out of the 100 traits in this dataset) with the largest mean lambda value. It is a bit disconcerting, however, to see for a simulated dataset that some of the traceplots for some lambda's do not seem to be setting down to a converged distribution but seem to jump around even after a large number of iterations (though factors 1-6 look good). Personally, I would be uncomfortable with these trace plots (to me, this suggests that there might be some multi-modality in the posterior distributions). However, I'm curious to know whether you have ways to improve the convergence of the MCMCs in MegaLMM, or whether such trace plots are not a concern (and if so why).

Many thanks, Timothy

deruncie commented 4 years ago

Hi Timothy,

Thanks for these comments. Yes, I agree that convergence of MegaLMM is an issue. I believe that the problem is that two aspects of the factors are only weakly identifiable, or not identifiable: 1) The ordering of the factors 2) The number of factors (Also, the orientation of the factors: multiplying a row of Lambda by -1 won’t change the posterior)

In any factor model, factors can be swapped and re-ordered at-will without changing the product F*Lambda, so there’s no information in the likelihood about the ordering of the factors. Also, in traditional factor models, factors can be arbitrarily rotated without affecting the likelihood. We use an approximate sparsity-inducing prior on the factor loadings to make the rotations identifiable, and this generally works. But the factor ordering would still be an issue, making for a multi-modal posterior. To partially solve this, we use an ordering prior that the variance of the elements of Lambda decreases stochastically with increasing row-index. However, whenever two factors (rows of Lambda) have equal variance, this ordering is not identifiable in the posterior. And in small sample sizes, this ordering may be highly uncertain. Unfortunately (or fortunately, depending on how you look at it), mixing of the ordering of the factors is very poor in the Gibbs sampler. I could try to add some bigger jumps in a Metropolis-Hastings step, but think this would be difficult. Instead, I’ve added the function reorder_factors that tries to manually fix swap factors to avoid the poor mixing of factor order. I find that once I get the factors in approximately the right order, the Gibbs sampler is sticky enough that they tend to stay there long enough so that sample averages and variances are useful approximations. Yes, because the posterior is multi-modal this is not the true posterior uncertainty, but I think it is an adequate practical solution.

However, the reasons why I am generally comfortable with it as is are:

Does this make sense to you? I’ve tried to find other ways to improve the mixing, but because of the inherent label switching problem in factor models, I’m not sure that there is a good solution for this model.

Dan

On Aug 16, 2020, at 8:25 PM, Timothy Bilton notifications@github.com wrote:

Hi,

I want to say a big thanks for putting in the effort of developing this R package. I quite like the approach you have taken to fit the multivariate LMM.

I have a query regarding convergence of the MCMC chains. I ran the code in the rmd file in the vignette but changed lines 172-187 to

n_iter = 1000; # how many samples to collect at once? for(i in 1:70) { print(sprintf('Run %d',i)) MegaLMM_state = sample_MegaLMM(MegaLMM_state,n_iter) # run MCMC chain n_samples iterations. grainSize is a paramter for parallelization (smaller = more parallelization)

MegaLMM_state = save_posterior_chunk(MegaLMM_state) # save any accumulated posterior samples in the database to release memory print(MegaLMM_state) # print status of current chain plot(MegaLMM_state) # make some diagnostic plots. These are saved in a pdf booklet: diagnostic_plots.pdf

set of commands to run during burn-in period to help chain converge

if(MegaLMM_state$current_state$nrun < MegaLMM_state$run_parameters$burn || i < 20) { MegaLMM_state = reorder_factors(MegaLMM_state,drop_cor_threshold = 0.6) # Factor order doesn't "mix" well in the MCMC. We can help it by manually re-ordering from biggest to smallest MegaLMM_state = clear_Posterior(MegaLMM_state) print(MegaLMM_state$run_parameters$burn) } } The diagnostic plots from the plot function are here https://github.com/deruncie/MegaLMM/files/5081878/diagnostics_plots.pdf. I mainly want to focus on the last page of the pdf. Looking at the code, I'm guessing that the different colored lines correspond to the five traits (out of the 100 traits in this dataset) with the largest mean lambda value. It is a bit disconcerting, however, to see for a simulated dataset that some of the traceplots for some lambda's do not seem to be setting down to a converged distribution but seem to jump around even after a large number of iterations (though factors 1-6 look good). Personally, I would be uncomfortable with these trace plots (to me, this suggests that there might be some multi-modality in the posterior distributions). However, I'm curious to know whether you have ways to improve the convergence of the MCMCs in MegaLMM, or whether such trace plots are not a concern (and if so why).

Many thanks, Timothy

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deruncie/MegaLMM/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2IDTONLDUH4A67GBBPSZ3SBCPL7ANCNFSM4QBGG7IQ.

tpbilton commented 4 years ago

Hi Dan,

Really appreciate your in depth response and yes what you have said makes a lot of sense and sheds a lot of light on what I'm seeing in the results.

One take away is then, looking at the trace plots of lambda may not be most important (but is helpful for determining whether you have enough factors) but rather the parameters one is interested in estimating. In my case, I wanting to use MegaLMM mainly for microbiome data and am mostly interested in the beta coefficients so I'll have a look at the trace plots for these parameters.

I think some of the points you have made you should put in some documentation, maybe a separate tutorial for the more advanced users (but I'm guessing that probably a time issue for you). I think these are some really important points from a practical point of view of implementing MegaLMM on real data.

Not to take away the good work you have done, but it would be nice if these issues could be sorted or have some mathematical theory that show the posteriors for the parameters of interest are not hugely affected by these convergence issues. I'm keen to see some more development of MegaLMM and happy to help out if possible.

I'll do a bit more playing around with some data and look at the convergence for the actual parameters of interest.

Cheers, Timothy