I'm guessing it finds a lot of repeats in those poly-N runs!
Need to mask long poly-runs of any base?
I have a sequence around 100k bp in length, but buffered at both ends with 'N's' so the total length of the sequence is 2.8 Mbp. Prokka gets stuck "searching for CRISPR repeats", and though it still finishes, takes >10x as long as annotating a 2.8 Mbp sequence with no Ns.
I've got this report for prokka which sounds like a minced bug: https://github.com/tseemann/prokka/issues/116
I'm guessing it finds a lot of repeats in those poly-N runs!
Need to mask long poly-runs of any base?
I have a sequence around 100k bp in length, but buffered at both ends with 'N's' so the total length of the sequence is 2.8 Mbp. Prokka gets stuck "searching for CRISPR repeats", and though it still finishes, takes >10x as long as annotating a 2.8 Mbp sequence with no Ns.