ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
523 stars 111 forks source link

Add Red prefilter #1317

Closed glennhickey closed 8 months ago

glennhickey commented 8 months ago

I tried to put a cactus ancestor through Red RepeatMasking via the cactus preprocessor and it crashed right away -- something I haven't seen on any real or test data so far.

After some trial and error, it looks like Red will crash if the input contains a contig that is

This PR adds a prefilter to catch these cases (it would eventually be nice to get into Red's code to fix it properly). Contigs that are smaller than 20kb or which are more than 98% a single base are filtered out before Red then added back in after. In the second case, the giant monomer runs in the contig are softmasked before being added back.