DominikDoom / a1111-sd-webui-tagcomplete

Booru style tag autocompletion for AUTOMATIC1111's Stable Diffusion web UI
MIT License
2.56k stars 306 forks source link

[Feature request] Fuzzy matching #279

Open Kenqr opened 6 months ago

Kenqr commented 6 months ago

Is it possible to change the matching method into fuzzy matching, so you can type "detco" to match "detached collar" or "bosh" to match "bookshelf"? I don't like to move my right hand between jkl and arrow keys often when typing, that means for some tags I need to type out most of it to make it on top of the list. Ex. "detached_co" for "detached_collar"

Many programs are using this matching method, like Visual Studio Code, PowerToys Run, Obsidian, etc. I think adding fuzzy matching would make the user experience much better.

DominikDoom commented 5 months ago

It's definitely a consideration for the future. I tried implementing it in the past, however I quickly encountered issues with balancing the fuzzy search results and other sort/filter conditions already in place and ultimately abandoned it for working on other features.

While the basic fuzzy search functionality is quick to implement using one of the many libraries that exist for it, it becomes really hard to balance for all aspects that TAC has to consider. Just to name a few issues:

This doesn't mean it can't be done, just that it's not straightforward and would likely require quite a bit of work to fit in nicely with the rest. Technically, each of the sources above could have its filter condition (at least partly) replaced by a fuzzy matcher, but that would still leave the displaying and sorting issues, which definitely need some custom logic. See the image below for the bookshelf example: The matches are mostly sensible, but not in a useful order based on the match, and due to the way aliases are implemented, TAC wrongly assumes most of the results come from one (but no matching mapping for display is found) image

Another option would be that the fuzzy matcher takes over completely and handles sorting / displaying on its own, however that could lead to issues with the matched tags not necessarily being common enough to be understood by models. Such a mode could instead work well if normal word lists are used instead of tags, which could be a valid use case too depending on the model.

For the near future, I instead opted for a different method to require less typing, which is currently in "beta" on the https://github.com/DominikDoom/a1111-sd-webui-tagcomplete/tree/feature-sort-by-frequent-use branch (hopefully merged soon, after I find the time to iron out the last few kinks). While the filtering is still exact there, it favors frequently used tags over others, so a tag you like to use in your prompts will rise to the top automatically over time. This would at least handle the "I need to type out most of it to make it on top of the list" side. Not abbreviations or typos though.

But once that's released, I'm definitely up for giving this a proper try.

Kenqr commented 5 months ago

Looks like it's more complicated than I thought.

For the first stage of the implementation, I think you can just use the current sorting order (sort by post count), and require the first letter of the input to match one of the initial letters of the tags. Using bosh as an example, the first letter is b, which doesn't match cowboy_shot's initial letters c and s. In this case, bookshelf would actually be on top of the list, so it looks pretty promising to me.

Btw I'm not a native speaker and don't quite understand what this sentence means: "and due to the way aliases are implemented, TAC wrongly assumes most of the results come from one (but no matching mapping for display is found)", so I could be missing something important.

Sort by frequency sounds like a good idea. I guess it could solve a large part of my issue here, so I'm looking forward to it. Thank you.

DominikDoom commented 5 months ago

For the first stage of the implementation, I think you can just use the current sorting order (sort by post count), and require the first letter of the input to match one of the initial letters of the tags. Using bosh as an example, the first letter is b, which doesn't match cowboy_shot's initial letters c and s. In this case, bookshelf would actually be on top of the list, so it looks pretty promising to me.

That would be a good workaround, yeah (and also more similar to how it currently behaves). But it doesn't work for every case out of the box, e.g. tags containing parentheses or special syntax for Loras, wildcards etc. that starts with a different letter. That wouldn't be too hard to check, though.

Btw I'm not a native speaker and don't quite understand what this sentence means: "and due to the way aliases are implemented, TAC wrongly assumes most of the results come from one (but no matching mapping for display is found)", so I could be missing something important.

This is just a technical detail, nothing important. In different terms, it's that the current method assumes that an alias or translation was typed if whatever you entered couldn't be found in the result text. Without fuzzy matching, this would always be true, so the display logic could rely on that and add the arrow etc. With fuzzy matching, what you typed can still match a tag despite the letters being incomplete or even out of order, so this assumption doesn't work anymore. It's not hard to fix and was just meant as an explanation for why the results in the image look a bit broken.

Symbiomatrix commented 5 months ago

If I may weigh in, the limited addition of regex for models in #275 could probably be extended with relative ease to tags, along with the handy ^/$ positional anchors. ^det*co for example is two more letters to type but would get the job done without having to rely on magic libraries. Sure beats typing whole words in any case.

DominikDoom commented 5 months ago

If I may weigh in, the limited addition of regex for models in #275 could probably be extended with relative ease to tags, along with the handy ^/$ positional anchors. ^det*co for example is two more letters to type but would get the job done without having to rely on magic libraries. Sure beats typing whole words in any case.

That would be a possibility, but has some of the same implementation challenges as the "magic" approach with sorting and display logic. So from a work standpoint, it wouldn't be that much easier, while also having the downside that users need to know their way around regex syntax.

Some of the libraries I toyed around with actually already support an autocomplete-like mode that takes word position into account (or can be configured in such a way), so it could be the best of both worlds if done properly. Probably also faster than any custom way I could come up with. The pure regex way would be a quick and dirty solution though if some challenges with the library approach pop up that I can't solve in a timely manner.

Symbiomatrix commented 5 months ago

Hmm, I guess that (^|[^a-z]) clause in autocomplete()'s searchRegex is supposed to prevent some of the clutter. Thought it'd be a oneliner there, but apparently this just finds any matches and doesn't store the exact location (also, the wildcard needs to stop at commas for the sake of aliases); addResultsToList() does its own search for which of base / alias / translation actually matched, and has a bit of a meltdown.

Is that what you meant by sort and display logic?

DominikDoom commented 5 months ago

Yeah that's mostly it, the addResultsToList() function does some lookup/logic on its own to determine the relation between the filtered tag subset and what was typed. Especially for translations as these can be looked up and displayed even if nothing from the typed tag word matches it.

This could probably be integrated with the result object in some way, e.g. that the filter conditions in the initial autocomplete have side effects to mark which one matched, but it would likely be easier to extend the logic in addResultsToList() a bit to handle a fuzzy match and not default to the alias processing in that case.

Symbiomatrix commented 5 months ago

Probably right it's the easier approach: I went with saving the pattern under result, and translations were definitely a hassle, though I think gpt refactored that part and added pattern extraction pretty well (didn't implement translations in my test though). One thing I noticed is that in fuzzy you'd probably lose the "bold pattern matched" bit from addResultsToList(), potentially yielding unclear matches.

TACWildcardTag

TACWildcardTagB

DominikDoom commented 5 months ago

If I end up using a library for it, most have a highlighter of their own to use. If not, it won't be hard to extend the current solution to work for all matches, even just highlighting the first time each typed letter shows up might be enough.