Open RagnarGrootKoerkamp opened 6 months ago
Hi @gmarcais @object022 I could reproduce this issue with the following inputs: w = 2 k = 8 k0 = 5 w0 = 3 seq = "ACAAGCATACCAGT"
My first thought is to fall back to a normal minimizer for such windows but it feels like a patchy solution!
I can create a pull request if having this option seems useful.
@myprogrammerpersonality Best is to just default to k0 = k - w
, assuming k>=w+4
or so.
Hi @object022,
I was reading the miniception paper to see if this could be used to improve the density of the minimizers used by SSHash.
SSHash uses parameters
k=21
andw=10
(sol=31
). From the paper, I got the impression that this should work well withk0 = 5
, but as it turns out, Miniception seems to make the implicit assumption thatw >= k - k0
. (I didn't find an explicit mention of this anywhere.)I tried changing the parameters in
reference_impl.py
to these values, which leads to a crash.I believe this is caused by the following:
k+w-1
long string.k0
-mer is exactly in the middle.k >> w
, all kmers include this k0-mer in the middle, but none contain it as their first or last k0-mer.Do you have results that cover this range?
There is also a typo in the caption of figure 6 of the preprint, and I'm not sure which of the two figures is w=10 vs. w=100. (Sadly the bioinformatics PDF is not available.)
Edit: I found the published pdf now. The caption of fig 6 still has a typo: it twice says 'fixed w and varying k', for both a, and for b and c. In c, w and k are both fixed once, according to the axes labels, so neither would be correct.
This makes the results a bit difficult to interpret; it's not quite clear whether the caption or axes labels are correct. Also, for example, it's not quite clear why the left and right half of b weren't merged into one plot, and why a has the larger value of w on the left.