genetics-statistics / GEMMA

Genome-wide Efficient Mixed Model Association
https://github.com/genetics-statistics/GEMMA
GNU General Public License v3.0
325 stars 124 forks source link

gemma bslmm problem #106

Closed hxtao closed 6 years ago

hxtao commented 6 years ago

@xiangzhou Hi First of all, Thanks for your bslmm model .When I used the software for GWAS,the output like this. `Command Line Input = -bfile fw -k fw-cen.txt -bslmm 1 -w 1000 -s 10000 -o fw

So I don't know how to perform multiple test and detect significant snps.

pcarbo commented 6 years ago

@hxtao The BSLMM model is not intended for performing association tests. Instead, you should use the linear regression (-lm) or LMM (-lmm) models, or the multivariate versions of these.

hxtao commented 6 years ago

@Peter Thanks for your reply .I have read the paper "Polygenic Modeling with Bayesian Sparse Linear Mixed Models",where the BSLMM model compared with the lmm model in GWAS.The paper emphasize on comparasion the difference between the models,but I want to detect the prominent SNP which is associated with phenotypes.

So I don't know how to detect when used the BSLMM model.

----原始邮件----- 发件人:"Peter Carbonetto" notifications@github.com 发送时间:2017-10-23 10:10:49 (星期一) 收件人: genetics-statistics/GEMMA GEMMA@noreply.github.com 抄送: hxtao hxtao@cau.edu.cn, Mention mention@noreply.github.com 主题: Re: [genetics-statistics/GEMMA] gemma bslmm problem (#106)

@hxtao The BSLMM model is not intended for performing association tests. Instead, you should use the linear regression (-lm) or LMM (-lmm) models, or the multivariate versions of these.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

xiangzhou commented 6 years ago

If you do want to use BSLMM for testing, you can extract gamma values from the *.assoc.txt file. The gamma value is between 0 and 1, and represents the posterior probability of association in the presence of population stratification (i.e. treating alpha as the population stratification effects). You can use gamma to prioritize SNPs, and you can use some standard cutoffs (say 0.5, 0.9 or 0.99) to declare significance. This approach is different from marginal association tests in lmm/mvlmm and accounts for linkage disequilibrium among SNPs. But you do need to run the model for a long time in order to get stable results.

hxtao commented 6 years ago

@xiangzhou Thanks for your reply. I see that it need to run the model for a long time.I just want to ask for a example or some paper for details .This is my first time to do GWAS with GEMMA,and I know I am a green hands.So I need a example to explain.

angelaparodymerino commented 6 years ago

Hi @xiangzhou and @pcarbo ,

I have the same doubt as @hxtao. I would like to extract conclusions of association between a phenotype and some SNPs from my BSLMM results.

I run:

gemma -g bimbampcas2.gen -k k_thin_100000.txt -p pheno.txt -notsnp -bslmm 1 -o out1

And the outputs I got are 5 .txt files (out1.bv, out1.gamma, out1.hyp, and out1.log) but any of them is a .assoc.txt file. Where can I get the gamma values -posterior probability of association- you were talking about?

And my second question is regarding what you said:

But you do need to run the model for a long time in order to get stable results.

I suppose that this refers to the two parameters (burn-in iterations and sampling iterations) which can be modified using -w and -s commands. Do you mean that we should run BSLMM several times increasing each time -w and -s until we reach stable results? Is this correct? Or how can I establish the values of -w -s that reaches stable results?

Thanks in advance,

Regards,

'Angela Parody-Merino

pcarbo commented 6 years ago

@angelaparodymerino For each SNP, there are two effect estimates: the small ("polygenic") effect, and the large ("QTL") effect. The right-most column in the *bslmm.param.txt file gives the probability that the large effect is non-zero. You can use this statistic for association testing, as Xiang recommended. Alternatively, you can use the *gamma.txt.gz file, but extracting this information from this file is more complicated.

@xiangzhou You mentioned the *.assoc.txt file, but I believe that there is no *.assoc.txt file from the -bslmm option. Is this correct?

@angelaparodymerino Regarding the -w and -s options, there are no good guidelines on how long a Markov chain should be run in MCMC. What I would suggest is try increasingly larger values for -s and compare the results in these different runs. For larger -s, the burn-in parameter -w does not matter, so you can set it to 0.

However, my general advice is that I would caution you from using the BSLMM model for association testing. As far as I know, there is only one paper (in Genetics) that has proposed using the BSLMM for genetic association tests, and it has several issues. By contrast, the LMM model (-lmm) has been widely used for association tests, and is much more straightforward to use. Alternatively, you can use the "sparse" version of the BSLMM model by setting -rmin 1, and this is more straightforward to interpret for association tests, but will not control for population structure and/or unequal relatedness.

xiangzhou commented 6 years ago

Yes, this is correct. It should be the param.txt file.

angelaparodymerino commented 6 years ago

Thanks for your answers. Very helpful.

I read in a paper (Delmore et al., 2016) the following:

It would have been an option to use posterior inclusion probabilities from BSLMMs to meet this objective but BSLMM do not allow you to include covariates and for our analyses of color we wanted to include sex as a covariate.

Is the reason why BSLMM shouldn't be used for testing association that it doesn't account for covariates? Then, what happens if I don't need to consider any covariate? I am thinking about standardize my phenotype by year and by site, and therefore I don't need to include them in the model. I am wondering if in this case it would be useful/acceptable/appropriate to look at gamma as indicative of association of each SNP to the phenotype under this hybrid model that BSLMM offers, which might be more powerful than a LMM since it takes into account two possible genetic architecture scenarios (polygenic vs. mono/oligogenic basis).

Thanks,

'Angela Parody-Merino

pcarbo commented 6 years ago

@angelaparodymerino Please see this paper for a discussion of some of the issues in using the BSLMM model to implement association tests (they use a model that is very similar to BSLMM). The issues are not immediately intuitive and require a deeper understanding of how the models are fitted. The short story is that the BSLMM suffers a loss of power because the "large" effects can be under-estimated due to the inclusion of small effects in the model. This is the same problem with all LMMs, but it is made worse by simultaneously learning a prior. The paper I cited attempts to circumvent these issues.

I would recommend trying both the -lmm and -bslmm options and compare the results.

angelaparodymerino commented 6 years ago

Many thanks @pcarbo for your answer and for the link to that paper.

Would it be correct to say then that BSLMM is useful for us (scientist studying the genetics behind phenotype/s) because it gives us an idea of the possible genetic architecture behind the phenotype? In other words, to test whether the link between the phenotype/s and the SNPs would fit better on a polygenic (many SNPs with small effects, LMM) vs. oligo/genic (a few snps with stronger effects, BVSM) model? As well as to estimate the "chip heritability" (h) or, in other words, how much of the phenotype variation is contained by the SNPs under BSLMM model.

I know it is pretty well explained in your paper "Polygenic Modeling with Bayesian Sparse Linear Mixed Models", but I just need reassurance.

Thanks,

'Angela Parody-Merino

pjotrp commented 6 years ago

Please discuss on the mailing list: https://groups.google.com/forum/#!forum/gemma-discussion