Open punkpeye opened 4 days ago
what "valid matches" have a score of 0.2?
there's not a way to prevent these matches. but you can just write your own filter to remove them based on result.indexes
function valid_match(result) {
const indexes = result.indexes
let sequenceStart = 0
let sequenceLength = 1
for (let i = 1; i < indexes.length; i++) {
if (indexes[i] === indexes[i-1] + 1) {
sequenceLength++
} else {
if (sequenceLength === 1) return false
sequenceStart = i
sequenceLength = 1
}
}
return sequenceLength !== 1
}
fuzzysort.go('shoe', ['i am designing an app that allows to chat with multiple LLMs at once.']).filter(valid_match)
I think there might be a mistake in the code? I see you have sequenceStart
, but that's not referenced anywhere.
Anyway, I get the idea – what does indexes
actually contain? The position of a match? How do I tell the length of the match then?
Okay, I figured out:
const extractMatches = (input: string, indexes: number[]): string[] => {
const result: string[] = [];
let currentSubstring = '';
for (const [i, element] of input.split('').entries()) {
if (indexes.includes(i)) {
currentSubstring += element;
} else if (currentSubstring) {
result.push(currentSubstring);
currentSubstring = '';
}
}
if (currentSubstring) {
result.push(currentSubstring);
}
return result;
};
.filter((result) => {
const matches = extractMatches(result.target, result.indexes.slice());
return matches.some((match) => match.length > 1);
})
Thank you!
This mostly fixed the issue... but you can still see the issue in highlight()
logic. What would be the way to filter out those single-character highlights without re-implementing the entire highlight logic?
I think there might be a mistake in the code? I see you have
sequenceStart
, but that's not referenced anywhere.
lol oops. it's AI generated
This mostly fixed the issue... but you can still see the issue in highlight() logic. What would be the way to filter out those single-character highlights without re-implementing the entire highlight logic?
what's the issue with highlight? if you filtered out single characters matches, you shouldn't be trying to highlight them
There might be a legit result where some highlighted snippets are multiple characters and others lone single characters. I would want to filter ou those that are lone single characters.
This is being matched for query 'shoe'
What's the best way to prevent such matches?
I've experimented with different threshold values, like
threshold: 0.2,
, but that didn't get me far as it quickly started removing valid matches while sill keeping these.