[Bug]: fuse.js extended search unintuitive results

christian-byrne commented 2 months ago

Frontend Version

1.2.28

Expected Behavior

Search to behave similar to other search features.

Actual Behavior

From the fuse.js docs on extended search option, when enabled:

White space acts as an AND operator

This leads to unexepcted behavior when searching multiple words, like getting this when searching "Save Image": Selection_280

Or this when searching "VAE Decode":

Selection_281

This was pointed out by SirVeggie on discord.

Steps to Reproduce

Search multi-word phrases in search

Debug Logs

Browser Logs

.

What browsers do you use to access the UI ?

Google Chrome

Other

When extended search is enabled, "VAE Decode" matches to "VAE Decode" with a score of 0.06403390388180329 and "VAEDecodeAudio" with a score of 0.0155. When extended search is disabled, "VAE Decode" matches to "VAE Decode" with a perfect score (0) and "VAEDecodeAudio" with a score of 0.3.

christian-byrne commented 2 months ago

The suggested solution was to keep useExtendedSearch enabled, since the tokenization improves accuracy of mixed-word-order searches, and to implement a custom sort function that sorted matches with very similar scores based on length. It was also suggested that the issue should be solved by using Bayesian search with telemetry data.

SirVeggie commented 2 months ago

Here's my suggestion for the custom sort function that seems to work well: Math.abs(a.score - b.score) > 0.01 ? a.score - b.score : (a.item[1]['v']['length'] - b.item[1]['v']['length'] || a.idx - b.idx)

christian-byrne commented 2 months ago

Here's my suggestion for the custom sort function that seems to work well: Math.abs(a.score - b.score) > 0.01 ? a.score - b.score : (a.item[1]['v']['length'] - b.item[1]['v']['length'] || a.idx - b.idx)

Comparison of current method vs. sorting function using all token permutations. It seems better in almost every scenario.

huchenlei commented 2 months ago

Thanks for the comparison function and benchmark!

huchenlei commented 2 months ago

I think the sort function is a bit off in some cases:

christian-byrne commented 2 months ago

I think the sort function is a bit off in some cases:

Are you getting "CLIP Text Encode (Prompt)" as the first option normally?

For me this is with vs. without sorting function:

Selection_281

I only get "CLIP Text Encode (Prompt)" as first result after I disable the extended search in fuse config.

christian-byrne commented 2 months ago

Extended search vs. no extended search comparison

Extended search has a lot of upsides when the ordering of words is incorrect in search phrase e.g. "Encode VAE" or "Image Save"

huchenlei commented 2 months ago

I do get the correct one when "CLIP Text Encode" is used as query before. And that one is used in the playwright test unfortunately.

christian-byrne commented 2 months ago

I see, it's "CLIPTextEncode." I misunderstood

https://github.com/Comfy-Org/ComfyUI_frontend/blob/f2de9b0d3c547c814baeb736dbecaedd3c41705e/browser_tests/nodeSearchBox.spec.ts#L39

Selection_282

SirVeggie commented 2 months ago

the reason why it's correct in the test you mentioned, is that when you type the exact node id, the old method is better in this outlier case, if you're missing even one letter, the old method results are equally bad.

Fixing exact matches can be done fairly easily by adding an additional condition Math.min(a.score, b.score) < 0.0001: sortFn: (a, b) => Math.min(a.score, b.score) < 0.0001 || Math.abs(a.score - b.score) > 0.01 ? a.score - b.score : (a.item[1]['v']['length'] - b.item[1]['v']['length'] || a.idx - b.idx)

I have another version that uses the node index instead of node title length. this one is "less correct", since it doesn't prioritize match accuracy, but in exchange it is closer to the og. sort order of the old UI. I think it's determined by the order in which the nodes are loaded? I'm not exactly sure. it also fixes many of the same problems the previous version aimed to fix.

this one basically prioritizes core nodes first, and based on their definition/load order. as a downside it's slightly worse for custom nodes, as the load/definition order of those is pretty "random". it should yield results that are more familiar, even if less accurate.

sortFn: (a, b) => Math.abs(a.score - b.score) > 0.01 ? a.score - b.score : a.idx - b.idx

Comfy-Org / ComfyUI_frontend