MCMC convergence for latent factor parameters.

tpbilton commented 4 years ago

Hi,

I want to say a big thanks for putting in the effort of developing this R package. I quite like the approach you have taken to fit the multivariate LMM.

I have a query regarding convergence of the MCMC chains. I ran the code in the rmd file in the vignette but changed lines 172-187 to

n_iter = 1000;  # how many samples to collect at once?
for(i  in 1:70) {
  print(sprintf('Run %d',i))
  MegaLMM_state = sample_MegaLMM(MegaLMM_state,n_iter)  # run MCMC chain n_samples iterations. grainSize is a paramter for parallelization (smaller = more parallelization)

  MegaLMM_state = save_posterior_chunk(MegaLMM_state)  # save any accumulated posterior samples in the database to release memory
  print(MegaLMM_state) # print status of current chain
  plot(MegaLMM_state) # make some diagnostic plots. These are saved in a pdf booklet: diagnostic_plots.pdf

  # set of commands to run during burn-in period to help chain converge
  if(MegaLMM_state$current_state$nrun < MegaLMM_state$run_parameters$burn || i < 20) {
    MegaLMM_state = reorder_factors(MegaLMM_state,drop_cor_threshold = 0.6) # Factor order doesn't "mix" well in the MCMC. We can help it by manually re-ordering from biggest to smallest
    MegaLMM_state = clear_Posterior(MegaLMM_state)
    print(MegaLMM_state$run_parameters$burn)
  }
}

The diagnostic plots from the plot function are here. I mainly want to focus on the last page of the pdf. Looking at the code, I'm guessing that the different colored lines correspond to the five traits (out of the 100 traits in this dataset) with the largest mean lambda value. It is a bit disconcerting, however, to see for a simulated dataset that some of the traceplots for some lambda's do not seem to be setting down to a converged distribution but seem to jump around even after a large number of iterations (though factors 1-6 look good). Personally, I would be uncomfortable with these trace plots (to me, this suggests that there might be some multi-modality in the posterior distributions). However, I'm curious to know whether you have ways to improve the convergence of the MCMCs in MegaLMM, or whether such trace plots are not a concern (and if so why).

Many thanks, Timothy

deruncie commented 4 years ago

Hi Timothy,

Thanks for these comments. Yes, I agree that convergence of MegaLMM is an issue. I believe that the problem is that two aspects of the factors are only weakly identifiable, or not identifiable: 1) The ordering of the factors 2) The number of factors (Also, the orientation of the factors: multiplying a row of Lambda by -1 won’t change the posterior)

In any factor model, factors can be swapped and re-ordered at-will without changing the product F*Lambda, so there’s no information in the likelihood about the ordering of the factors. Also, in traditional factor models, factors can be arbitrarily rotated without affecting the likelihood. We use an approximate sparsity-inducing prior on the factor loadings to make the rotations identifiable, and this generally works. But the factor ordering would still be an issue, making for a multi-modal posterior. To partially solve this, we use an ordering prior that the variance of the elements of Lambda decreases stochastically with increasing row-index. However, whenever two factors (rows of Lambda) have equal variance, this ordering is not identifiable in the posterior. And in small sample sizes, this ordering may be highly uncertain. Unfortunately (or fortunately, depending on how you look at it), mixing of the ordering of the factors is very poor in the Gibbs sampler. I could try to add some bigger jumps in a Metropolis-Hastings step, but think this would be difficult. Instead, I’ve added the function reorder_factors that tries to manually fix swap factors to avoid the poor mixing of factor order. I find that once I get the factors in approximately the right order, the Gibbs sampler is sticky enough that they tend to stay there long enough so that sample averages and variances are useful approximations. Yes, because the posterior is multi-modal this is not the true posterior uncertainty, but I think it is an adequate practical solution.

However, the reasons why I am generally comfortable with it as is are:

at least in the current application I’ve been working with, Lambda (and F) are really nuisance parameters: the target of inference is predictions of Y or U, which are functions of the product FLambda, or Lambda^TSigma*Lambda which are not affected (much) by the factor orderings or label switching. And I find that these quantities generally seem to mix much better in trace-plots or using Rhat diagnostics
Because of the ordering prior, the factors that appear multi-modal tend to be ones with high row-rank in Lambda, and so are enforced to be pretty small and inconsequential, especially in larger datasets. In simulations where I’m actually trying to learn Lambda (rather than L^TL), I find that factors that have good trace-plots tend to be real and well-estimated, while those that jump around a lot tend to be mostly noise. Basically, what’s happening here is that these factors have very little influence on the likelihood (because of the FLambda product), and so the Gibbs sampler is just trying to sample the prior with no data and that’s very sticky). It’s clearly not perfect, but I find that practically it seems to work pretty well.
To answer your other question, because of the prior that shrinks higher-order factors towards zero, my goal is always to include too many factors. If I see the last several factors are inconsistent in the trace-plots (and generally have small values for the loadings), I have confidence that I’ve included enough to capture the covariance in the data. If not, I add more factors and re-run.

Does this make sense to you? I’ve tried to find other ways to improve the mixing, but because of the inherent label switching problem in factor models, I’m not sure that there is a good solution for this model.

Dan

On Aug 16, 2020, at 8:25 PM, Timothy Bilton notifications@github.com wrote:

Hi,

I want to say a big thanks for putting in the effort of developing this R package. I quite like the approach you have taken to fit the multivariate LMM.

I have a query regarding convergence of the MCMC chains. I ran the code in the rmd file in the vignette but changed lines 172-187 to

n_iter = 1000; # how many samples to collect at once? for(i in 1:70) { print(sprintf('Run %d',i)) MegaLMM_state = sample_MegaLMM(MegaLMM_state,n_iter) # run MCMC chain n_samples iterations. grainSize is a paramter for parallelization (smaller = more parallelization)

MegaLMM_state = save_posterior_chunk(MegaLMM_state) # save any accumulated posterior samples in the database to release memory print(MegaLMM_state) # print status of current chain plot(MegaLMM_state) # make some diagnostic plots. These are saved in a pdf booklet: diagnostic_plots.pdf

set of commands to run during burn-in period to help chain converge

if(MegaLMM_state$current_state$nrun < MegaLMM_state$run_parameters$burn || i < 20) { MegaLMM_state = reorder_factors(MegaLMM_state,drop_cor_threshold = 0.6) # Factor order doesn't "mix" well in the MCMC. We can help it by manually re-ordering from biggest to smallest MegaLMM_state = clear_Posterior(MegaLMM_state) print(MegaLMM_state$run_parameters$burn) } } The diagnostic plots from the plot function are here https://github.com/deruncie/MegaLMM/files/5081878/diagnostics_plots.pdf. I mainly want to focus on the last page of the pdf. Looking at the code, I'm guessing that the different colored lines correspond to the five traits (out of the 100 traits in this dataset) with the largest mean lambda value. It is a bit disconcerting, however, to see for a simulated dataset that some of the traceplots for some lambda's do not seem to be setting down to a converged distribution but seem to jump around even after a large number of iterations (though factors 1-6 look good). Personally, I would be uncomfortable with these trace plots (to me, this suggests that there might be some multi-modality in the posterior distributions). However, I'm curious to know whether you have ways to improve the convergence of the MCMCs in MegaLMM, or whether such trace plots are not a concern (and if so why).

Many thanks, Timothy

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deruncie/MegaLMM/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2IDTONLDUH4A67GBBPSZ3SBCPL7ANCNFSM4QBGG7IQ.

tpbilton commented 4 years ago

Hi Dan,

Really appreciate your in depth response and yes what you have said makes a lot of sense and sheds a lot of light on what I'm seeing in the results.

One take away is then, looking at the trace plots of lambda may not be most important (but is helpful for determining whether you have enough factors) but rather the parameters one is interested in estimating. In my case, I wanting to use MegaLMM mainly for microbiome data and am mostly interested in the beta coefficients so I'll have a look at the trace plots for these parameters.

I think some of the points you have made you should put in some documentation, maybe a separate tutorial for the more advanced users (but I'm guessing that probably a time issue for you). I think these are some really important points from a practical point of view of implementing MegaLMM on real data.

Not to take away the good work you have done, but it would be nice if these issues could be sorted or have some mathematical theory that show the posteriors for the parameters of interest are not hugely affected by these convergence issues. I'm keen to see some more development of MegaLMM and happy to help out if possible.

I'll do a bit more playing around with some data and look at the convergence for the actual parameters of interest.

Cheers, Timothy

deruncie / MegaLMM

MCMC convergence for latent factor parameters. #2

set of commands to run during burn-in period to help chain converge