genetics-statistics / GEMMA

Genome-wide Efficient Mixed Model Association
https://github.com/genetics-statistics/GEMMA
GNU General Public License v3.0
328 stars 124 forks source link

MAF filtering when using -gxe #156

Open gdevailly opened 6 years ago

gdevailly commented 6 years ago

Hello,

when using gemma (v0.97) with the -gxe parameter to detect interactions with the -maf option set to 0.05, the MAF filtering is done globally, and not per environment. This results in gemma finding highly significant interactions supported by only one or two samples. For example, this SNP interaction had a p_wald < 10^-40 while being supported by only one sample.

rplot (numbers in brakets are the sample size)

Should the MAF filtering be done per environment instead?

It is what I will do now before feeding the genotypes to gemma (I also cleaned the phenotype data).

Many thanks, and sorry if this is not relevant or was already discussed elsewhere (I could not find it).

Guillaume

pjotrp commented 6 years ago

@xiangzhou can you make a suggestion?

xiangzhou commented 6 years ago

I don't know if this is due to algorithm instability when there is only one AG sample in the trop environment. To check this, you can try the score test instead of the Wald test. The score test is more stable than the Wald. If a score test doesn't help, then perhaps we should perform MAF filtering per environment.

pcarbo commented 6 years ago

@gdevailly I agree with Xiang; the Wald test is unreliable, especially for small samples. In my experience the likelihood ratio test is most reliable; I don't recall if that is available with the -gxe option.

Also, filtering out SNPs based on MAF, separately in each environment, sounds like an excellent idea.