Maggi-Chen / Inspector

A tool for evaluating long-read de novo assembly results
MIT License
21 stars 9 forks source link

Question: how many times should I run inspector? #21

Open Nicholas-Kron opened 1 year ago

Nicholas-Kron commented 1 year ago

I am currently assembling a teleost genome using HiFi reads and I decided to use Inspector to correct my primary assembly. I observed marked improvements in the assembly after just a single round, but the corrected assembly was not free from errors (see table below). I was wondering if there is any benefit to running inspector in multiple rounds similar to how older long read genomes needed several rounds of polishing? Is there a risk of over-polishing or introducing errors?

Statics of contigs: initial 1 round
Number of contigs 625 625
Number of contigs > 10000 bp 625 625
Number of contigs >1000000 bp 242 242
Total length 2272598809 2272481657
Total length of contigs > 10000 bp 2272598809 2272481657
Total length of contigs >1000000bp 2170694264 2170577325
Longest contig 68564173 68545954
Second longest contig length 68389185 68371778
N50 24159452 24149395
N50 of contigs >1Mbp 24159452 24149395
Read to Contig alignment:
Mapping rate /% 99.99 99.99
Split-read rate /% 9.62 9.62
Depth 45.576 45.5783
Mapping rate in large contigs /% 95.97 95.97
Split-read rate in large contigs /% 9.65 9.66
Depth in large conigs 45.7891 45.7926
Structural error 501 209
Expansion 363 139
Collapse 79 31
Haplotype switch 48 27
Inversion 11 12
Small-scale assembly error /per Mbp 32.7136491 1.937969438
Total small-scale assembly error 74345 4404
Base substitution 55461 2875
Small-scale expansion 11403 691
Small-scale collapse 7481 838
QV 35.4455019 38.49163495