1) How informative are the 1-mer regions? Will performance degrade when using MANTIS just on the 2- to 5-mer bed?
2) Did anybody benchmark if this improves running time?
In our experience, 1-mer regions tend to be more informative than 2-mer to 5-mer. However, there are several potential explanations for this, for instance the fact that there tend to be more 1-mers than higher k-mers.
I believe (without looking at the code in detail at the moment) that MANTIS runtime is roughly O(n) in number of loci, with relatively large constants from I/O. Therefore, I expect that a significant reduction in locus number would be necessary to substantially improve running time.
The documentation states
-l | Minimum k-mer length (bp). Default: 1
when generating a bed file to use with MANTIS.
My questions are:
1) How informative are the 1-mer regions? Will performance degrade when using MANTIS just on the 2- to 5-mer bed? 2) Did anybody benchmark if this improves running time?