Stratified QQ - Githubissues

swvanderlaan commented 5 years ago

Hi,

It would be great if you could add in a function to make stratified QQ plots. For instance stratified by bins of info-score (e.g. https://github.com/swvanderlaan/MetaGWASToolKit/blob/master/SCRIPTS/plotter.qq_by_info.R) and minor allele frequency (e.g. https://github.com/swvanderlaan/MetaGWASToolKit/blob/master/SCRIPTS/plotter.qq_by_caf.R). These are great diagnostic tools to review which the best filtering settings are for the data.

Best,

Sander

YinLiLin commented 5 years ago

Hi Sander,

Thank you for your suggestion, i saw your well-written script, it is a good reference to follow up, I will have a try and achieve it in CMplot with your permission.

regards, Lilin

swvanderlaan commented 5 years ago

Thanks for the compliment 👨🏻‍💻😁

That would be great. Please go ahead...

Do you have a timeline? It would be great if you could add ... I might simply switch to your package ...

swvanderlaan commented 4 years ago

Any progress on this?

YinLiLin commented 4 years ago

Oh, very sorry for that. I missed your response here, apologise for it. I remembered that I checked your script, it seems that we need MAF or other Information to achieve it? am I right? if yes, it maybe a little hard to incorporate this function with CMplot, as CMplot only requires SNP Che Pos Pvalue1 Pvalue2 Pvalue3..., which are generally provided by lots of GWAS soft wares and can be easily prepared by users.

swvanderlaan commented 4 years ago

I would argue that MAF and INFO are available for each GWAS. My idea would be to have stratified QQ plots including lambda's per bin (and potentially counts of variants), as a way to assess the raw results from GWAS prior to filtering them. Below an example.

These plots are quite informative.

You are right, that for final GWAS summary statistics from meta-analyses, INFO might not be available, but for every GWAS the MAF or CAF or EAF or AF should be available. And all GWAS softwares that I work with - SNPTEST and PLINK for instance - produce (raw) results with these variables.

It would be another type of course and yes, one would have to supply these data as a pre-requisite, plot_type = "qs" for instance.

Would be great.

Happy to help implement it - if you could help explaining a bit more what is what in the CMplot function... :-)

YinLiLin commented 4 years ago

Thanks for providing the examples, I agree with you, and the stratified QQplot using MAF information is definitely worth to have a try. To avoid breaking the structure of current data format, how about adding a parameter 'maf' in CMplot allowing users to input the MAF information? then we can use it to drew the figure1 you shown above.

swvanderlaan commented 4 years ago

Yes that would be a great idea. I would go for two flags maf and info. So users would have to run it twice. And it would impede in the datastructuren - I believe. Because instead of maf you'd have info as an extra column.

YinLiLin commented 4 years ago

I found it can be achieved on the current version of CMplot when I was in process of tweaking the script, an example was shown below:

library(CMplot)
data(pig60K)

# maf generating
set.seed(123)
maf=0.001+0.45 * runif(nrow(pig60K))

# group assigning on basis of maf
p1=p2=p3=rep(NA, nrow(pig60K))
p1[maf<0.05]=pig60K$trait1[maf<0.05]
p2[maf<0.1&maf>=0.05]=pig60K$trait1[maf<0.1&maf>=0.05]
p3[maf>0.1]=pig60K$trait1[maf>0.1]
data=cbind(pig60K[,1:3], pig60K$trait1, p1, p2, p3)
colnames(data)[-c(1:3)]=c("All", "maf<0.05", "0.05=<maf<0.1","maf>=0.1")

# plot
CMplot(data, plot.type="q", multracks=T, conf.int=F)

The final visualised result:

Is it consistent with the figures you mentioned above? That means we need to adjust the format of the data manually prior to plotting.

swvanderlaan commented 4 years ago

Yes, this is a good start. Very good. Only thing lacking is the lambda per bin. That way you can assess whether there is inflation originating from that bin or not.

YinLiLin / CMplot

Stratified QQ #13