If I understand properly you are running bitap algorithm to find position, then use levenshtein to find edit distance.
Fuzzymatch.prototype._query = function(listName, query) {
//..
var position = self._match_main(data[i].name.toLowerCase(), query, 0);
//...
var distance = self.levenshtein(datum.name, query);
var score = distance + position;
//...
}
However bitap by itself is a levenshtein distance algorithm, so you are computing the score twice. A wiser option would be to modify _match_bitap_ to output it's score (and if you wish, position) :
//d is levenshtein distance, j is position
var score = match_bitapScore_(d, j - 1);
(You could however use the levenshtein function to compute match on string >32 char, because that is a limitation of bitap.)
Finaly that line:
var score = distance + position;
It means that an offset on 1 in position is as costly as an error in edit distance.
It's more severe than usual, and if it's indeed what you need, then there's no need to compute position at all, just compute distance from position 0 (error in distance and in position have same cost)
A typical usage is like this (score=distance+position/10 with your settings)
Lastly: Performant to at least ~10,000 items.
My own experience with bitap suggest it's has trouble handling that many data in interactive time.
have you tested that many item without if (num_matchs > 100) { break; } ? Supose you search for a string that is so uncommon that all of 10000 record need to be searched ?
If I understand properly you are running bitap algorithm to find position, then use levenshtein to find edit distance.
However bitap by itself is a levenshtein distance algorithm, so you are computing the score twice. A wiser option would be to modify
_match_bitap_
to output it's score (and if you wish, position) :(You could however use the levenshtein function to compute match on string >32 char, because that is a limitation of bitap.)
Finaly that line:
It means that an offset on 1 in position is as costly as an error in edit distance. It's more severe than usual, and if it's indeed what you need, then there's no need to compute position at all, just compute distance from position 0 (error in distance and in position have same cost)
A typical usage is like this (score=distance+position/10 with your settings)
Lastly:
Performant to at least ~10,000 items.
My own experience with bitap suggest it's has trouble handling that many data in interactive time. have you tested that many item withoutif (num_matchs > 100) { break; }
? Supose you search for a string that is so uncommon that all of 10000 record need to be searched ?