Open swvanderlaan opened 5 years ago
Hi Sander,
Thank you for your suggestion, i saw your well-written script, it is a good reference to follow up, I will have a try and achieve it in CMplot with your permission.
regards, Lilin
Thanks for the compliment 👨🏻💻😁
That would be great. Please go ahead...
Do you have a timeline? It would be great if you could add ... I might simply switch to your package ...
Any progress on this?
Oh, very sorry for that.
I missed your response here, apologise for it. I remembered that I checked your script, it seems that we need MAF or other Information to achieve it? am I right? if yes, it maybe a little hard to incorporate this function with CMplot, as CMplot only requires SNP Che Pos Pvalue1 Pvalue2 Pvalue3...
, which are generally provided by lots of GWAS soft wares and can be easily prepared by users.
I would argue that MAF
and INFO
are available for each GWAS. My idea would be to have stratified QQ plots including lambda's per bin (and potentially counts of variants), as a way to assess the raw results from GWAS prior to filtering them. Below an example.
These plots are quite informative.
You are right, that for final GWAS summary statistics from meta-analyses, INFO
might not be available, but for every GWAS the MAF
or CAF
or EAF
or AF
should be available. And all GWAS softwares that I work with - SNPTEST and PLINK for instance - produce (raw) results with these variables.
It would be another type of course and yes, one would have to supply these data as a pre-requisite, plot_type = "qs"
for instance.
Would be great.
Happy to help implement it - if you could help explaining a bit more what is what in the CMplot
function... :-)
Thanks for providing the examples, I agree with you, and the stratified QQplot using MAF information is definitely worth to have a try. To avoid breaking the structure of current data format, how about adding a parameter 'maf' in CMplot allowing users to input the MAF information? then we can use it to drew the figure1 you shown above.
Yes that would be a great idea. I would go for two flags maf
and info
. So users would have to run it twice. And it would impede in the datastructuren - I believe. Because instead of maf
you'd have info
as an extra column.
I found it can be achieved on the current version of CMplot when I was in process of tweaking the script, an example was shown below:
library(CMplot)
data(pig60K)
# maf generating
set.seed(123)
maf=0.001+0.45 * runif(nrow(pig60K))
# group assigning on basis of maf
p1=p2=p3=rep(NA, nrow(pig60K))
p1[maf<0.05]=pig60K$trait1[maf<0.05]
p2[maf<0.1&maf>=0.05]=pig60K$trait1[maf<0.1&maf>=0.05]
p3[maf>0.1]=pig60K$trait1[maf>0.1]
data=cbind(pig60K[,1:3], pig60K$trait1, p1, p2, p3)
colnames(data)[-c(1:3)]=c("All", "maf<0.05", "0.05=<maf<0.1","maf>=0.1")
# plot
CMplot(data, plot.type="q", multracks=T, conf.int=F)
The final visualised result:
Is it consistent with the figures you mentioned above? That means we need to adjust the format of the data manually prior to plotting.
Yes, this is a good start. Very good. Only thing lacking is the lambda per bin. That way you can assess whether there is inflation originating from that bin or not.
Hi,
It would be great if you could add in a function to make stratified QQ plots. For instance stratified by bins of info-score (e.g. https://github.com/swvanderlaan/MetaGWASToolKit/blob/master/SCRIPTS/plotter.qq_by_info.R) and minor allele frequency (e.g. https://github.com/swvanderlaan/MetaGWASToolKit/blob/master/SCRIPTS/plotter.qq_by_caf.R). These are great diagnostic tools to review which the best filtering settings are for the data.
Best,
Sander