akcorut / kGWASflow

kGWASflow is a Snakemake workflow for performing k-mers-based GWAS.
https://github.com/akcorut/kGWASflow/wiki
MIT License
28 stars 8 forks source link

Manhattan plots fail #25

Open Ax3man opened 10 months ago

Ax3man commented 10 months ago

Hi Kivanc,

I've updated kGWASflow and tried to run the analysis including aligning the kmers. Unfortunately, generating the Manhattan plot yields errors, with unhelpful logs:

y_max: 19.284643066713652
y_min: 9.220349838454867
Plotting...
/MankFlex/wouter/color_selection_lines/kmergwas/kgwasflow/.snakemake/scripts/tmpk48uhlxn.plot_manhattan.py:141: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  align_kmers_sam_with_all_chrom_sorted = align_kmers_sam_with_all_chrom.sort_values(by=["chr", "bp"], key=natsort.natsort_keygen())

I don't really care about the manhattan plots, I can make those myself, but turning them off here didn't do anything. I tried to check, and as far as I can tell, that option does not appear to be used anywhere? Shouldn't the manhattan rule have an if statement like here?

I'll just continue without the aligned kmers, but thought I would let you know.

akcorut commented 10 months ago

Hi @Ax3man,

Thank you for reporting this issue. You are right about the plot_manhattan setting. I originally planned manhattan plot generation to be optional, but later I decided to keep it permanent as part of align_kmers phase since it wasn't costing much time or memory, and technically any output from align_kmers step should work with plot_manhattan rule without any problem.

Would you mind sharing your SAM file output generated during align_kmers step? It should be in "results/align_kmers/{your_pheno}/{your_pheno}_kmers_alignment.sam". I would like to see what caused the problem during Manhattan plot generation so I can fix it.

I will also make the necessary changes regarding plot_manhattan setting.

Best, Kivanc

cedarwarman commented 10 months ago

Hi Kivanc,

I'm getting the same error, I checked for the SAM output as you suggested and there wasn't one. All of log files in that directory were empty, except kmers_align.bowtie2.log, which read:

54 reads; of these:
  54 (100.00%) were unpaired; of these:
    0 (0.00%) aligned 0 times
    36 (66.67%) aligned exactly 1 time
    18 (33.33%) aligned >1 times
100.00% overall alignment rate

I'm assuming this is why the Manhattan plot isn't working, since the Manhattan plot error message is talking about an empty Series. Not sure why the SAM output isn't being created or why the Manhattan step is running without it, I will let you know if I track it down.

Thanks,

Cedar

cedarwarman commented 9 months ago

I've generated the SAM output, here are the first few lines:

@HD     VN:1.5  SO:unsorted     GO:query
@SQ     SN:SL4.0ch00    LN:9643250
@SQ     SN:SL4.0ch01    LN:90863682
@SQ     SN:SL4.0ch02    LN:53473368
@SQ     SN:SL4.0ch03    LN:65298490
@SQ     SN:SL4.0ch04    LN:64459972
@SQ     SN:SL4.0ch05    LN:65269487
@SQ     SN:SL4.0ch06    LN:47258699
@SQ     SN:SL4.0ch07    LN:67883646
@SQ     SN:SL4.0ch08    LN:63995357
@SQ     SN:SL4.0ch09    LN:68513564
@SQ     SN:SL4.0ch10    LN:64792705
@SQ     SN:SL4.0ch11    LN:54379777
@SQ     SN:SL4.0ch12    LN:66688036
@PG     ID:bowtie2      PN:bowtie2      VN:2.4.5        CL:"/xdisk/rpalaniv/cedar/kmers-gwas/flowers_varitome_test_dir/.snakemake/conda/221bdcf96cc8a42228d668ab0f75056c_/bin/bowtie2-align-s --wrapper basic-0 -p 1 -x /xdisk/rpalaniv/cedar/kmers-gwas/flowers_varitome_test_dir/resources/ref/genome/bowtie2_index/genome -f /xdisk/rpalaniv/cedar/kmers-gwas/flowers_varitome_test_dir/results/fetch_kmers/anther_pistil_ratio_kmers_list.fa -S /home/u16/cedar/scratch/kGWASflow/aligned_kmers/aligned_kmers.sam"
kmer1_3.707106e-11      16      SL4.0ch09       48997168        1       31M     *       0       0       TATTTCCGCACTTTGTTAGATATTTGGTTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:24G6       YT:Z:UU
kmer2_3.415117e-12      16      SL4.0ch01       10753768        24      31M     *       0       0       GAGGTAATTATCAATTTCTTTTTGACATTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:25T5       YT:Z:UU
kmer3_5.486155e-13      16      SL4.0ch03       15427129        24      31M     *       0       0       TCACTCTGTCTGAAAAAGAATGTCCTAGTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:2C28       YT:Z:UU
kmer4_8.487645e-12      16      SL4.0ch05       48950027        1       31M     *       0       0       AACTCTCCTTTGTAGGTATGTACCTCCATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-6 XS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:1C29       YT:Z:UU

I think the problem is that the regex in the plot_manhattan.py script is not recognizing the chromosome format. I'm using the tomato genome.

VanOverbeeke commented 1 week ago

Hi @cedarwarman, I had the same issue today and was able to continue making Manhattan plots by replacing this line: https://github.com/akcorut/kGWASflow/blob/27654c8398249a5581c1463b2036211cb3fdbc0d/workflow/scripts/plot_manhattan.py#L73 with name = chromosome_name and it will let you go on with your own chromosome names. If you have many smaller contigs that you want to leave out, you'll have to tweak the regex yourself.