Open pmario opened 10 months ago
Something else to add to the mix. In Fuzzy Search – use diff-match-patch or other algorithms, I discuss an alternative implementation I developed for a StackOverflow answer, using an extension of Dice's Coefficient to include consideration of the multiplicities of bigrams and not just their existence.
You can see running versions of the Levenshtein distance and my String similarity pseudo-metric running on the Ramda REPL. (I'm using Ramda here only for the sorting; we can remove it easily enough.) The Levenshtein code is pulled randomly from the web. The string similarity code is my own. It's written in modern JS and would need to be refactored to work in ES5; that shouldn't be a big deal.
You can comment/uncomment the various search strings in the two versions, or edit to add your own. I limited the output to the top 50, which you can change in the last lines. We could easily change to limit to a certain metric threshold.
I'm not sure if this offers anything helpful here, but it seems to be related.
likewise, i don't know if this will be helpful, but i've done pretty extensive comparisons of many fuzzy and fulltext search libs in the process of writing https://github.com/leeoniya/uFuzzy
Hi @leeoniya thank you, uFuzzy looks very useful, and glad to see that it is a success. It is small enough that we might consider including it in the TiddlyWiki core (the other alternative is to integrate it via a plugin).
A key criteria for inclusion of a feature in the core is "universality", which roughly means that a plausible case can be made that that feature is beneficial to all users of TiddlyWiki (regardless of language, culture, etc.). An opposing criterion is the desire not to make the core too complex, of course.
In the case of uFuzzy, that leads to a question: I'd be very interested to understand better is whether including it would be useful for, say, our Chinese speaking users? I'll tag @linonetwo and @oeyoews @BramChen @oflg who I think are our most active Chinese speaking contributors. If uFuzzy has the potential to be useful to Chinese speakers then I'd very much like to explore integrating it in the core.
uFuzzy uses regular expressions for search and matching, so it theoretically supports all languages. I did a small test and it seems good. It seems that it can be used in tw.
Hi @leeoniya thank you, uFuzzy looks very useful, and glad to see that it is a success. It is small enough that we might consider including it in the TiddlyWiki core (the other alternative is to integrate it via a plugin).
I think we do not need a new library, that is language dependent by default.
I think the existing diff-match-patch library already allows us to implement a fuzzy search as mentioned here at GH.
GH issue from @yaisog
I think it should be possible to use the match-demo code to implement a fuzzy search. @yaisog mentioned the "fuzzy location" parameter. But it is optional so setting it to 0 for our title search should be perfectly fine.
The only parameter that needs to be adjusted on a "per wiki / per language" basis may be the "match threshold", since it may depend on the language used in the wiki.
Also see
If there is a standard usage in the core to handle fuzzy search, I can patch it to support Chinese fuzzy search.
I already have a version in https://github.com/tiddly-gittly/tiddlywiki-plugins/tree/master/src/pinyin-fuzzy-search that addes pinyinfuse
filter operator. Just don't have a way to apply to standard search in tw. It currently only being used in my command palette plugin.
As the title says: "The core search result list should return relevant info for basic search terms - early"
What are basic search terms for new users.
tiddler, wiki, tiddlywiki, filter, widget, macro, list, theme, procedure, template, tab, json, data, plugin, save, import, image, edition, link, field, tag, wikitext, ...
The following screenshots show 45 results for every basic term mentioned above.
[!is[system]search:title<userInput>] :sort:integer[levenshtein<userInput>] :and[limit[250]]
You can decide for yourself, what makes more sense.