OSU-SRLab / MANTIS

Microsatellite Analysis for Normal-Tumor InStability
GNU General Public License v3.0
69 stars 27 forks source link

RepeatFinder: recommended minimum k-mer length #23

Closed messersc closed 6 years ago

messersc commented 6 years ago

The documentation states

-l | Minimum k-mer length (bp). Default: 1

when generating a bed file to use with MANTIS.

My questions are:

1) How informative are the 1-mer regions? Will performance degrade when using MANTIS just on the 2- to 5-mer bed? 2) Did anybody benchmark if this improves running time?

rbonneville commented 6 years ago
  1. In our experience, 1-mer regions tend to be more informative than 2-mer to 5-mer. However, there are several potential explanations for this, for instance the fact that there tend to be more 1-mers than higher k-mers.
  2. I believe (without looking at the code in detail at the moment) that MANTIS runtime is roughly O(n) in number of loci, with relatively large constants from I/O. Therefore, I expect that a significant reduction in locus number would be necessary to substantially improve running time.