gc5k / GEAR

GEAR [GEnetic Analysis Repository], contact chenguobo@gmail.com;
https://github.com/gc5k/GEAR/wiki
18 stars 6 forks source link

Run EigenGWAS with more than two groups #5

Closed biozzq closed 3 years ago

biozzq commented 3 years ago

Dear all,

I wonder if I can run EigenGWAS when I have more than ten groups. Thank you.

Best wishes, Zheng zhuqing

gc5k commented 3 years ago

Since EigenGWAS is an unsupervised method, it has no idea how many groups inside. EigenGWAS learns from the data itself how to group it best.

It's better to try it first, then see whether the result make sense or not. My suggestion.

biozzq commented 3 years ago

Dear @gc5k

Thank you. I tried on my data. The command and logs are as following. The Lambda GC is 126.5, indicating substantial population stratification, however, no significant sites were identified after GC correction. How do you think about my data, can i use the raw p value to identify loci under selection?

java -jar -Xms10G -Xmx100G gear.jar eigengwas --bfile plink --ev 2 --out out

[INFO] 542 individuals were matched for analysis.
[INFO] 
[INFO] Calculating locus statistics with 1 threads.
[INFO] Average MAF: 0.2475
[INFO] Average variance: 0.3814
[INFO] Average missing rate: 0.0070
[INFO] Calculating eGWAS with 1 thread.
[INFO] Median of p values is 3.241851231905457E-14
[INFO] Lambda GC is: 126.59680460266985
[INFO] 542 individuals were matched for analysis.
[INFO] 
[INFO] Calculating locus statistics with 1 threads.
[INFO] Average MAF: 0.2475
[INFO] Average variance: 0.3814
[INFO] Average missing rate: 0.0070
[INFO] Calculating eGWAS with 1 thread.
[INFO] Median of p values is 1.4095455993179407E-4
[INFO] Lambda GC is: 31.85171494145232

Sincerely, Zheng zhuqing

gc5k commented 3 years ago

You are doing well with the provided line and the output seems okay. The sample, my guess, is very unlikely a nature population but a breeding population that has been highly selected. If you draw a Manhattan plot, the signals may, or should, be around the p-value threshold but not yet exceeds it .

It is overkilling of the signals as experienced by certain kind of data, such as the one you have. We have already fixed the problem but haven't published the algorithm yet.

biozzq commented 3 years ago

Dear @gc5k

All the Pgc values are nearly 1, do you mean that I should using the raw-p value to generate Manhattan plot? This population is generated from worldwide, and population structure analysis indicated these individuals could be clustered by their geographic distribution except for those mixture ones. And this population includes both wild and domestic samples.

Thanks for your efforts, wishes to see your new algorithm.

Sincerely, Zheng zhuqing

gc5k commented 3 years ago

You may send over *.egwas file and we can make a refined correction for you.

chen.guobo@foxmail.com.

biozzq commented 3 years ago

Dear @gc5k

Thank you. As the size of *1.egwas file is about 700M after compression, I have uploaded the file to google driver and you can download it using following link. https://drive.google.com/file/d/1IvILrv6UPTW3E0pB2UVGrWXH69s3L4mH/view?usp=sharing

Sincerely, Zheng zhuqing

gc5k commented 3 years ago

We will take a look. Hold on.

gc5k commented 3 years ago

Dear @gc5k

Thank you. As the size of *1.egwas file is about 700M after compression, I have uploaded the file to google driver and you can download it using following link. https://drive.google.com/file/d/1IvILrv6UPTW3E0pB2UVGrWXH69s3L4mH/view?usp=sharing

Sincerely, Zheng zhuqing

What's your email?

biozzq commented 3 years ago

Dear @gc5k

Here is my email address, zzq1207@126.com

Thank you.

Sincerely, Zheng zhuqing