0xTCG / biser

A fast tool for detecting and decomposing segmental duplications in genome assemblies
MIT License
41 stars 0 forks source link

Whether the number of contigs affects the efficiency of BISER #35

Open life404 opened 9 months ago

life404 commented 9 months ago

Hi, First, thank for you tools. I run the biser on a genome assembly with 25 chromosome-level scaffolds and 1983 short contigs (the longest contig is ~ 300kb and the shortest is ~2kb). The biser run a long time (> 2 Days) and used up the memory (> 1 Tb). However, it takes a very short time on other genome with only 194 contigs.

So I try to use the SEDEF to analysis the genomes with a high number of contigs. I observed that SEDEF also required a substantial amount of time for short contigs processing. For instance, after performing SEDEF translation, the group 22 (consisting of more than 800 contigs) and group 23 (with over 300 contigs) took more than 48 hours to complete processing.

Will a large number of short contigs significantly increase runtime? Should I filter out shorter contigs (e.g., < 10kb)? Could you please provide some suggestions?

Thank You

inumanag commented 8 months ago

Hi @life404

How are you running BISER? Are you contigs soft-masked with RepeatMasker?