Tests showed that for HW, when score is big enough, it may be more beneficial to start with larger k! For example, when similarity (1 - score / read_length) is < 60% we can get better results by just running edlib with k = read_length then using k = -1.
So how can we use this to speed up edlib? If we could have some way of very quickly and roughly estimating the similarity of two sequences up front, we could make a decision: "they seem to be pretty unsimilar, so lets use k=read_length instead of k=-1".
Tests showed that for HW, when score is big enough, it may be more beneficial to start with larger k! For example, when similarity (1 - score / read_length) is < 60% we can get better results by just running edlib with k = read_length then using k = -1.
So how can we use this to speed up edlib? If we could have some way of very quickly and roughly estimating the similarity of two sequences up front, we could make a decision: "they seem to be pretty unsimilar, so lets use k=read_length instead of k=-1".