a-slide / NanoCount

EM based transcript abundance from nanopore reads mapped to a transcriptome with minimap2
https://a-slide.github.io/NanoCount/
MIT License
53 stars 5 forks source link

ZeroDivisionError: division by zero #5

Closed callumparr closed 4 years ago

callumparr commented 4 years ago

I have started process my sequencing libraries for direct RNA with master of pores and it ran smoothly through my first library. Then when I process the next one it failed at the count job.

https://github.com/biocorecrg/master_of_pores/issues/60#issue-610633267

I downloaded NanoCount and ran through manually and it gives same error:

(base) callum@d4-lm2:~$ /home/callum/.local/bin/NanoCount -i master_of_pores/NanoPreprocess/work/3f/71df9957d8b2151cbe1af158b1896f/fast5_pass.minimap2.sorted.bam -o Nano.count --verbose
Run parameters
    Minimal read length:50
    Minimal aligned fraction of query read:0.5
    Equivalent threshold:0.9
    Scoring value:alignment_score
    Convergence target:0.005
Parse Bam file and filter low quality hits
Traceback (most recent call last):
  File "/home/callum/.local/bin/NanoCount", line 8, in <module>
    sys.exit(main())
  File "/home/callum/.local/lib/python3.6/site-packages/NanoCount/__main__.py", line 48, in main
    verbose =args.verbose)
  File "/home/callum/.local/lib/python3.6/site-packages/NanoCount/NanoCount.py", line 67, in __init__
    self.read_dict = self._parse_bam ()
  File "/home/callum/.local/lib/python3.6/site-packages/NanoCount/NanoCount.py", line 163, in _parse_bam
    if self.scoring_value == "alignment_score" and hit.align_score/best_hit.align_score < self.equivalent_threshold:
ZeroDivisionError: division by zero

I ran flagstat and I cannot see issue with the BAM file generated.

3148715 + 0 in total (QC-passed reads + QC-failed reads)
1676236 + 0 secondary
82251 + 0 supplementary
0 + 0 duplicates
3148715 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Day1_01_DRS.flagstat (END)
a-slide commented 4 years ago

Hi @callumparr

Thanks for reporting the issue. I think I just fixed it in version 0.2. Can you please check ?

Thanks

callumparr commented 4 years ago

Hey it still gives error on that library. I've checked other libraries and there doesn't seem to be an issue so I guess there is just something quirky about the BAM file generated.

a-slide commented 4 years ago

Damn! It looks like some of your primary reads have an alignment score of 0 (AS tag). I did not know it was possible. Anyway I just pushed a hotfix (v0.2.1) that should solve the issue for good Can you run NanoCount in verbose mode ? You should see reads in the category "Reads with zero score".

callumparr commented 4 years ago

Thank you for your efforts. It now runs on that library. This was the DEBUG output.

    Parse Bam file and filter low quality hits
    [DEBUG]: Summary of reads parsed in input bam file
    [DEBUG]:    Mapped hits: 3,102,305
    [DEBUG]:    Valid best hit: 1,346,053
    [DEBUG]:    Valid secondary hit: 1,254,780
    [DEBUG]:    Invalid secondary hit: 412,875
    [DEBUG]:    Wrong strand hits: 44,852
    [DEBUG]:    Best hit with low query fraction aligned: 31,192
    [DEBUG]:    Reads too short: 3
    [DEBUG]:    Reads with zero score: 1
    Generate initial read/transcript compatibility index
a-slide commented 4 years ago

Great. I don't quite understand why the aligner allows a primary read with an alignment score of 0 but, hey, this is the joy of bioinformatics.

Thanks for reporting the issue

callumparr commented 4 years ago

Its all rather worrying.