JiaoLaboratory / CRAQ

Identification of errors in draft genome assemblies with single-base pair resolution for quality assessment and improvement
https://doi.org/10.1038/s41467-023-42336-w
MIT License
53 stars 5 forks source link

Illegal division by zero #13

Closed pants08 closed 6 months ago

pants08 commented 6 months ago

Hello! I ran CRAQ on a small partial genome with SMS reads only, and I ran into this error.

_Running CRAQ benchmark analysis ...... CMD: /data/wenhuaming/software/CRAQ/bin/../src/runAQI_SMS.sh -g good_zhou_haploid.fa -z seq.size -e LRout/LR_eff.size -C LRout/LR_putative.SE.SH -D LRout/LR_sort.depth -r 0.75 -p 0.4 -q 0.6 -R 0.75 -P 0.4 -Q 0.6 -n 10 -s 391 -w 500000 -j 1 -b F -y -t -x /data/wenhuaming/data/tomato/CRAQ/original/haploid/output/seq.size -v F [M::worker_pipeline:: Filter putative LER] [M::worker_pipeline:: Quality benchmarking [M::worker_pipeline:: Create CRAQ metrics] [M::worker_pipeline:: Create final report] Illegal division by zero at /data/wenhuaming/software/CRAQ/src/final_short_report_minlen.pl line 23, line 12. Illegal division by zero at /data/wenhuaming/software/CRAQ/src/final_short_report_minlen.pl line 23, line 12. CRAQ analysis is finished. Check current directory runAQIout for final results!

The alignment seems fine, but the out_final.Report has no content other than the head. I am sure my genome has no zero length contigs. It seems you deleted all the intermediate files so I can not debug it on my own.

BTW, the -t option is not working, when I set it to 64, the alignment still worked under default 10 threads. And the -pl option either, I did not see any plot generated when it is on. But they are not related to this issue, I just think you might wanna notice.

Best wishes!

JiaoLaboratory commented 6 months ago

Thank you very much for your alert. However, I have not identified why the bug occurs at the moment. By the way, did you encounter any issues when running 'perl craq.pl -g assembly.fa -sms SMS.fa' with test data? Also, would you mind sharing the version of CRAQ (is latest?maybe re-download is OK ?)and the specific log file with me?"

pants08 commented 6 months ago

Thank you for your reply! I had no problems runing on test data in your example, here is the result.

_Short Report:

Chr Covered.Rate Low-confident.Rate Avg.CRH Avg.CSH Avg.CRE(R-AQI) Avg.CSE(S-AQI)

Genome 0.997219611498728 0.0218650288644202 0 0 4.61092620443156(63.0594268772554) 1.53697540147719(21.5030499779253) Chr1 0.997219611498728 0.0218650288644202 0 0 4.62378211505665(62.978410329888) 1.54126070501888(21.4111000391095)_

I had successfully ran CRAQ on other genomes with SMS reads only before, if that is what you are worrying, but I do not understand what differences cause the failure in this genome. I am using CRAQ Version: 1.0.9-alpha installed by git, and reinstalling did not solve the problem.

Here is a fail run log file. CRAQ.log

JiaoLaboratory commented 6 months ago

Thank you very much, there is really a bug, probably caused by some tiny fragment of the genome file, I forgot to update this in the latest version before; I just fixed the bug (while,I'm not quite sure if that's the reason for your results) . Would you mind re-loading CRAQ and running again? sorry for the bug! If any truble, please let me know.

pants08 commented 6 months ago

Thanks! It work out fine after you fixed the bug. But for the fragments of the genome file you mentioned, their results seem a little bit weird, and the whole genome results seems not affected by these fragments' bad scores. Is this situation normal?

_Short Report:

Chr Covered.Rate Low-confident.Rate Avg.CRH Avg.CSH Avg.CRE(R-AQI) Avg.CSE(S-AQI)

Genome 0.998676931819965 0.00611926234137116 1.65202831906945 0 4.33657433755729(64.8134246374451) 0.206503539883681(81.3423382027709) pat_chr05_1 0.996541086859197 0.0338115307058645 0 0 2.4616445016093(78.179364826227) 0(100) mat_chr05_1 0.999998596014366 0 1.40398760559742 0 2.80797521119484(75.518122825059) 0(100) mat_chr11_1 0.999998300498974 0 0 0 3.39900782961454(71.1840946009616) 0(100) pat_chr09_5 0.994981256730283 0 0 0 4.22214684908218(65.5593278301561) 0(100) pat_chr03_1 0.999997775523642 0 0 0 4.44896261314268(64.0890757748803) 0(100) mat_chr07_4 0.999997287098253 0.00576221894261655 18.990363746853 0 5.42581821338658(58.1245641072581) 0(100) pat_chr05_2 0.999996875019531 0 0 0 6.24998046881104(53.5262473949222) 0(100) pat_chr12_2 0.999996428596939 0 0 0 7.14283163274417(48.954290838485) 0(100) mat_chr01_1 0.999971200313341 0 0 0 11.5202064420994(31.5997605103624) 0(100) pat_chr08_3 0.99999762299232 0 0 0 4.75402666058151(62.1634694870151) 2.37701333029076(9.28274087126674) mat_chr10_2 0.999025341130604 0 0 0 1951.21951219512(1.81808124682901e-83) 0(100) mat_chr06_1 0.000974658869395711 0 0 0 2000000(0) 0(100) mat_chr05_2 0.000974658869395711 0 0 0 2000000(0) 0(100) mat_chr12_2 0.000189537528430629 0 0 0 2000000(0) 0(100) mat_chr081 0.000974658869395711 0 0 0 2000000(0) 0(100)

JiaoLaboratory commented 6 months ago

it seems normal for your result; the final AQI calculations were somawhat influenced by the low scores of these fragments. For example, in your results, pat_chr08_3 has a very low SAQI, leading to a decrease in the overall S-AQI for the entire genome. Based on the current findings, you may need to pay special attention to the chromosome pat_chr08_3, as there might be some misjoins present.

pants08 commented 6 months ago

OK! Thank you sincerely for your help!