Cloufield / gwaslab

A Python package for handling and visualizing GWAS summary statistics. https://cloufield.github.io/gwaslab/
GNU General Public License v3.0
149 stars 25 forks source link

Manhattan plot issue #106

Closed MC4R closed 3 weeks ago

MC4R commented 3 weeks ago

Hi,

Thank you very much for this wonderful package. I was trying to create Manhattan plot using Alzheimer's disease GWAS data with the following code (GWASLab v3.4.31);

mysumstats = gl.Sumstats("ad.gwas.ma",
             snpid="SNP",
              chrom="CHR",
             pos="BP",
             p="p",
             sep="\t")
mysumstats.plot_mqq(mode="m",
    build="19",
    anno="GENENAME",
    marker_size=(2,2),
    figargs={"figsize":(20,5),"dpi":500},
    #anno_style="tight",
    save="my_first_mqq_plot1.png", 
    #save_args={"dpi":400,"facecolor":"white"}),
    #repel_force=0.2,
    #use_rank=True,
    xpad=0.01,
    colors=["#31A0CC", "#126080"],
    sig_line=True, 
    sig_level=5e-8, 
    sig_line_color="red",
    sc_linewidth=1)

However, this code does not annotate gene names properly. The output that I get is; my_first_mqq_plot

Do you have any thoughts on what might be going wrong here?

Thank you very much for your time. Cheers, Dev

Cloufield commented 3 weeks ago

Hi Dev, Would you please try the latest version v3.4.47, since v3.4.31 might be a little old. And I just noted a bug for xpad, but you can do xpadl=0.01, xpadr=0.01 to do the same thing.

gl1.plot_mqq(mode="m",
    build="19",
    anno="GENENAME",
    marker_size=(2,2),
    figargs={"figsize":(20,5),"dpi":500},
    save="my_first_mqq_plot1.png", 
    xpadl=0.01,
    xpadr=0.01,
    colors=["#31A0CC", "#126080"],
    sig_line=True, 
    sig_level=5e-8, 
    sig_line_color="red",
    sc_linewidth=1)

image

MC4R commented 3 weeks ago

Hi,

Thank you. I tried plotting the Alzheimer's GWAS with v3.4.47, but the issue still persists. Please see the log below.

2024/08/21 10:35:16 Start to create MQQ plot...v3.4.47:
2024/08/21 10:35:16  -Genomic coordinates version: 19...
2024/08/21 10:35:16  -Genome-wide significance level to plot is set to 5e-08 ...
2024/08/21 10:35:16  -Raw input contains 10687077 variants...
2024/08/21 10:35:16  -MQQ plot layout mode is : m
2024/08/21 10:35:19 Finished loading specified columns from the sumstats.
2024/08/21 10:35:19 Start data conversion and sanity check:
2024/08/21 10:35:19  -Removed 0 variants with nan in CHR or POS column ...
2024/08/21 10:35:20  -Removed 0 variants with CHR <=0...
2024/08/21 10:35:21  -Removed 0 variants with nan in P column ...
2024/08/21 10:35:22  -Sanity check after conversion: 22 variants with P value outside of (0,1] will be removed...
2024/08/21 10:35:22  -Sumstats P values are being converted to -log10(P)...
2024/08/21 10:35:23  -Sanity check: 0 na/inf/-inf variants will be removed...
2024/08/21 10:35:24  -Converting data above cut line...
2024/08/21 10:35:24  -Maximum -log10(P) value is 276.70008117084797 .
2024/08/21 10:35:25 Finished data conversion and sanity check.
2024/08/21 10:35:25 Start to create MQQ plot with 10687055 variants...
2024/08/21 10:35:36  -Creating background plot...
2024/08/21 10:36:03 Finished creating MQQ plot successfully!
2024/08/21 10:36:03 Start to extract variants for annotation...
2024/08/21 10:36:04  -Found 33 significant variants with a sliding window size of 500 kb...
2024/08/21 10:36:04 Start to annotate variants with nearest gene name(s)...
2024/08/21 10:36:04  -Assigning Gene name using ensembl_hg19_gtf for protein coding genes
2024/08/21 10:36:05 Finished annotating variants with nearest gene name(s) successfully!
2024/08/21 10:36:05 Finished extracting variants for annotation...
2024/08/21 10:36:05 Start to process figure arts.
2024/08/21 10:36:05  -Processing X ticks...
2024/08/21 10:36:05  -Processing X labels...
2024/08/21 10:36:05  -Processing Y labels...
2024/08/21 10:36:05  -Processing Y tick lables...
2024/08/21 10:36:05  -Processing Y labels...
2024/08/21 10:36:05  -Processing lines...
2024/08/21 10:36:05 Finished processing figure arts.
2024/08/21 10:36:05 Start to annotate variants...
2024/08/21 10:36:05  -Annotating using column GENENAME...
2024/08/21 10:36:05  -Adjusting text positions with repel_force=0.03...
2024/08/21 10:36:05 Finished annotating variants.
2024/08/21 10:36:05 Start to save figure...
2024/08/21 10:36:47  -Saved to my_first_mqq_plot1.png successfully! (overwrite)
2024/08/21 10:36:47 Finished saving figure...
2024/08/21 10:36:47 Finished creating plot successfully!
(<Figure size 20000x5000 with 1 Axes>, <gwaslab.g_Log.Log object at 0x7ff433bc04f0>)
Cloufield commented 3 weeks ago

Hi, Would you please let me know your matplotlib version?

MC4R commented 3 weeks ago

Thank you very much for the prompt response. matplotlib version is : 3.8.4

MC4R commented 3 weeks ago

Hi, I was also trying to fix the padding before the first and last chromosome with xpad in version v3.4.47 but it does not work for me. Do you have any thoughts on how to fix this?

Thank you.

Cloufield commented 3 weeks ago

Hi Dev, I am fixing this padding issue and will release a new version soon (within this week). But for the annotation problem, I haven't replicated it yet.
Your option figargs={"figsize":(20,5),"dpi":500} does not match with your figure <Figure size 20000x5000 with 1 Axes>. Would you please try with other figargs and see if the problem still exists?

Cloufield commented 3 weeks ago

Hi, I just updated v3.4.48. And xpad is fixed. I also added xtight so that x padding can be removed like:

mysumstats.plot_mqq(mode="m",
    build="19",
    anno="GENENAME",
    marker_size=(2,2),
    figargs={"figsize":(20,7),"dpi":300},
    save="my_first_mqq_plot1.png", 
    xtight=True,
    colors=["#31A0CC", "#126080"],
    sig_line=True, 
    sig_level=5e-8, 
    sig_line_color="red",
    sc_linewidth=1)

image

For annotation, I added anno_height to manually adjust the bar length like:

mysumstats.plot_mqq(mode="m",
    build="19",
    anno="GENENAME",
    marker_size=(2,2),
    figargs={"figsize":(20,7),"dpi":300},
    save="my_first_mqq_plot1.png", 
    xtight=True,
    anno_height=1.2,
    colors=["#31A0CC", "#126080"],
    sig_line=True, 
    sig_level=5e-8, 
    sig_line_color="red",
    sc_linewidth=1)

image You can try this and see if this improves your plot.

MC4R commented 3 weeks ago

Hi Yunye,

Thank you very much for your time. Seems like the issue is fixed in the latest version. Here is an updated Manhattan plot. As you can see there are many GWAS signals close to each other and the annotation does not look great. Any thoughts on how to adjust this

my_first_mqq_plot

I tried anno_style="tight" as well and but gene names on chromosome 14-15 are way too close to each other

my_first_mqq_plot_AT

Cloufield commented 3 weeks ago

Hi Dev, It seems that the issue is still there in your first plot. The length of the vertical arms seem to be fixed. I am wondering if you also used other options for the plot?

There is another style anno_style=expand , which might be helpful. image

Additionally, you can also try use anno_set to manually annotate the loci of interest, or sig_level_lead=1e-20 to set the signiificance threshold for selecting loci for annotation. (https://cloufield.github.io/gwaslab/Visualization/#annotation)

MC4R commented 3 weeks ago

Hi Yunye,

Thank you. I will try these options. Thanks again for this wonderful package.

Cheers Dev