chimeric mitochondrial-nuclear scaffolds

Question or Expected behavior I have generated genome assemblies for two different species of butterfly. The assembly sizes are ~700-800Gb after running purge_dups. In both assemblies I find that there is a large chimeric scaffold several Mbp in length which contain the entire ~15kb mitogenome embedded in it. The 15kb mitogenome portion of the scaffolds are 99.9-100% identical to the mitogenome assembled independently from Illumina data. So this is clearly a mis-assembly.

1) How can I avoid these chimeric scaffolds? Is the much higher expected coverage of the mitogenome not used to prevent this happening?

2) The presence of this chimeric scaffold makes me worry that there may be other chimeric scaffolds involving only nuclear sequence that are not so easily detected.

Thanks, KD

Operating system LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: CentOS Description: CentOS Linux release 7.9.2009 (Core) Release: 7.9.2009 Codename: Core

GCC gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

Python Python 3.7.4

NextDenovo nextDenovo v2.5.0

Additional context (Optional) Add any other context about the problem here.

Nextomics / NextDenovo

chimeric mitochondrial-nuclear scaffolds #162