Kingsford-Group / miniception

12 stars 1 forks source link

Miniception only works for `w >= k - k0` #1

Open RagnarGrootKoerkamp opened 6 months ago

RagnarGrootKoerkamp commented 6 months ago

Hi @object022,

I was reading the miniception paper to see if this could be used to improve the density of the minimizers used by SSHash.

SSHash uses parameters k=21 and w=10 (so l=31). From the paper, I got the impression that this should work well with k0 = 5, but as it turns out, Miniception seems to make the implicit assumption that w >= k - k0. (I didn't find an explicit mention of this anywhere.)

I tried changing the parameters in reference_impl.py to these values, which leads to a crash.

I believe this is caused by the following:

Do you have results that cover this range?

There is also a typo in the caption of figure 6 of the preprint, and I'm not sure which of the two figures is w=10 vs. w=100. (Sadly the bioinformatics PDF is not available.)

Edit: I found the published pdf now. The caption of fig 6 still has a typo: it twice says 'fixed w and varying k', for both a, and for b and c. In c, w and k are both fixed once, according to the axes labels, so neither would be correct.

This makes the results a bit difficult to interpret; it's not quite clear whether the caption or axes labels are correct. Also, for example, it's not quite clear why the left and right half of b weren't merged into one plot, and why a has the larger value of w on the left.

myprogrammerpersonality commented 4 weeks ago

Hi @gmarcais @object022 I could reproduce this issue with the following inputs: w = 2 k = 8 k0 = 5 w0 = 3 seq = "ACAAGCATACCAGT"

My first thought is to fall back to a normal minimizer for such windows but it feels like a patchy solution!

I can create a pull request if having this option seems useful.

RagnarGrootKoerkamp commented 4 weeks ago

@myprogrammerpersonality Best is to just default to k0 = k - w, assuming k>=w+4 or so.