Maggi-Chen / Inspector

A tool for evaluating long-read de novo assembly results
MIT License
21 stars 9 forks source link

An error occurred:ZeroDivisionError: float division by zero #9

Open zhangwenda0518 opened 2 years ago

zhangwenda0518 commented 2 years ago

Dear teacher, thank you for your work. I'm using Inspector for evaluation and correction. An error occurred:ZeroDivisionError: float division by zero。 The log is as follows. Inspector.log Inspector starting... 23/06/2022 11:01:15 Start Assembly evaluation with contigs: ['../../../../10.resoult/genome_assemblyed/pbipa.fasta'] TIME: Before read mapping 1.2830026149749756 TIME: Read Alignment: 178.04207849502563 nohup.txt

I run normally on other assemblies. Can you give me a solution?

In addition, I have two questions. First, how big QV value of genome evaluation belongs to a better assembly. Second, I want to know the difference between Inspector and nextpolish in correction, and whether I need to further use correction software on this basis.

Maggi-Chen commented 1 year ago

Hello Wenda,

Thank you for your interest in our tool. Based on the error info, it seems like there are problems in processing the BAM file (read_to_contig.bam). It seems that Inspector did not find any mapped reads in the BAM, which is definitely not usual. What species are you working with? Could you check file size of the read_to_contig.bam in the output directory? If the file size is reasonable (at similar level of the file size of input fastq.gz), could you look into the BAM and see if all reads are unaligned?

And for your questions:

  1. The QV of an assembly is the higher the better. Usually I would consider an assembly to be good when the QV is higher than 30, which means it has a error rate lower than 0.1%.
  2. In terms of error correction, I would suggest you to test different polishing tools and see which one gets the best polishing results on your datasets.

Thanks, Maggi

harris-2374 commented 1 year ago

I resolved the same issue by running dos2unix on the contig FASTA file before running Inspector. I could show through some testing that FASTA files formatted in CRLF were causing pysam to be unable to fetch chromosomes on line 358 of the _detectsortbam function in _debreakdetect.py. This was causing Inspector not to create the files in the _mapdepth/ folder required for the cov calculation on line 131 of inspector.py (which throws the ZeroDivisionError since it thinks no reads are mapped). After converting the FASTA file, Inspector ran without error.

I also tested this with the test data that comes with Inspector, and when contig_test.fa is run through unix2dos and then passed to Inspector, the ZeroDivisionError is thrown as expected.