farzher / fuzzysort

Fast SublimeText-like fuzzy search for JavaScript.
https://rawgit.com/farzher/fuzzysort/master/test/test.html
MIT License
3.92k stars 158 forks source link

How to prevent single character matches? #136

Open punkpeye opened 4 days ago

punkpeye commented 4 days ago

This is being matched for query 'shoe'

Screenshot 2024-09-17 at 8 42 56 AM

What's the best way to prevent such matches?

I've experimented with different threshold values, like threshold: 0.2,, but that didn't get me far as it quickly started removing valid matches while sill keeping these.

farzher commented 4 days ago

what "valid matches" have a score of 0.2?

there's not a way to prevent these matches. but you can just write your own filter to remove them based on result.indexes

farzher commented 4 days ago
function valid_match(result) {
  const indexes = result.indexes
  let sequenceStart = 0
  let sequenceLength = 1

  for (let i = 1; i < indexes.length; i++) {
    if (indexes[i] === indexes[i-1] + 1) {
      sequenceLength++
    } else {
      if (sequenceLength === 1) return false
      sequenceStart = i
      sequenceLength = 1
    }
  }

  return sequenceLength !== 1
}

fuzzysort.go('shoe', ['i am designing an app that allows to chat with multiple LLMs at once.']).filter(valid_match)
punkpeye commented 3 days ago

I think there might be a mistake in the code? I see you have sequenceStart, but that's not referenced anywhere.

Anyway, I get the idea – what does indexes actually contain? The position of a match? How do I tell the length of the match then?

punkpeye commented 3 days ago

Okay, I figured out:

const extractMatches = (input: string, indexes: number[]): string[] => {
  const result: string[] = [];
  let currentSubstring = '';

  for (const [i, element] of input.split('').entries()) {
    if (indexes.includes(i)) {
      currentSubstring += element;
    } else if (currentSubstring) {
      result.push(currentSubstring);
      currentSubstring = '';
    }
  }

  if (currentSubstring) {
    result.push(currentSubstring);
  }

  return result;
};
.filter((result) => {
  const matches = extractMatches(result.target, result.indexes.slice());

  return matches.some((match) => match.length > 1);
})

Thank you!

punkpeye commented 3 days ago

This mostly fixed the issue... but you can still see the issue in highlight() logic. What would be the way to filter out those single-character highlights without re-implementing the entire highlight logic?

farzher commented 2 days ago

I think there might be a mistake in the code? I see you have sequenceStart, but that's not referenced anywhere.

lol oops. it's AI generated

This mostly fixed the issue... but you can still see the issue in highlight() logic. What would be the way to filter out those single-character highlights without re-implementing the entire highlight logic?

what's the issue with highlight? if you filtered out single characters matches, you shouldn't be trying to highlight them

punkpeye commented 1 day ago

There might be a legit result where some highlighted snippets are multiple characters and others lone single characters. I would want to filter ou those that are lone single characters.