The number of causals when running MCMC

gkichaev / PAINTOR_V3.0

Fast, integrative fine mapping with functional data

55 stars 20 forks source link

The number of causals when running MCMC #23

Closed variani closed 6 years ago

variani commented 6 years ago

Hi,

I am trying to run MCMC (-mcmc argument) and specify the number of causal snps (-max_causal argument). It seems that the MCMC procedure doesn't care about -max_causal. Do you have any specification on the number of causal snps and their prior probabilities when running MCMC?

When running -enumerate 2, the output seems consistent. That is, the sum of posterior probabilities of snps to be causal sums up to 2. However, this sum value is something random, e.g. 4.7 or 6.4, when running -mcmc instead.

Basically, I would expect the same behavior as in FINEMAP (http://www.christianbenner.com/), where the controls are given with --n-causal-snps and --prior-k arguments.

Thanks.

gkichaev commented 6 years ago

Yes, the sum will have some randomness attached to it. You can try running the chain for a larger number of samples and/or running PAINTOR multiple time and then averaging over the posteriors. This should give more robust estimates.

Since the likelihood model is quite similar for both FINEMAP and PAINTOR, you could consider running both of them to see if you get roughly similar results.

variani commented 6 years ago

Thanks for your helpful comments.

One of my motivation to address this parametrization was exactly comparisons, e.g. FINEMAP and PAINTOR. But it seems to me not possible without control of similar --n-causal-snps and --prior-k arguments in PAINTOR (running practical examples gives the same conclusions). Further, I don't see the way to compare PAINTOR with its two runs, -enumerate 3 and -mcmc, because the number of causals goes beyond 3 in the later case (I guess).

What is the best resource to get an idea about parametrization of the sampling procedure? Any particular paper of PAINTOR?

Thanks again for your answers.

variani commented 6 years ago

My other 5 cents on why it would be handy to control on --n-causal-snps. Following the recent review from FINEMAP authors (PMID 28942963), misspecification of LD is a major issue in real data analysis of especially large-scale datasets. When "the posterior probability of the number of causal variants concentrates on the maximum value possible", it is an indication of the LD mismatch problem. Such a sanity check was very useful in my real data analysis to cope with inconsistent fine-mapping results.

I guess you can close the ticket. Thanks.