While testing cases related to #503 , it became apparent that for very small readsets (e..g, two reads) the default filtering parameter -F is too stringent. Values of -F smaller than approximately 0.001 produce no overlaps.
The right way to fix this is to properly handle repetitive minimizers. We could do this with a fixed mask, a weighting function like that used in WinnowMap, or by rearchitecting the sketch handling in cudamapper to function like MashMap. As a temporary fix, it might make sense to use a filtering parameter value scaled by the number of reads in the input data (probably growing 1 / (number of reads)^2, with a minimum of 2e-4).
While testing cases related to #503 , it became apparent that for very small readsets (e..g, two reads) the default filtering parameter
-F
is too stringent. Values of-F
smaller than approximately 0.001 produce no overlaps.The right way to fix this is to properly handle repetitive minimizers. We could do this with a fixed mask, a weighting function like that used in WinnowMap, or by rearchitecting the sketch handling in cudamapper to function like MashMap. As a temporary fix, it might make sense to use a filtering parameter value scaled by the number of reads in the input data (probably growing 1 / (number of reads)^2, with a minimum of 2e-4).