TiddlyWiki / TiddlyWiki5

A self-contained JavaScript wiki for the browser, Node.js, AWS Lambda etc.
https://tiddlywiki.com/
Other
8.04k stars 1.19k forks source link

[BUG] The core search result list should return relevant info for basic search terms - early #7917

Open pmario opened 10 months ago

pmario commented 10 months ago

As the title says: "The core search result list should return relevant info for basic search terms - early"

What are basic search terms for new users.

tiddler, wiki, tiddlywiki, filter, widget, macro, list, theme, procedure, template, tab, json, data, plugin, save, import, image, edition, link, field, tag, wikitext, ...

The following screenshots show 45 results for every basic term mentioned above.

You can decide for yourself, what makes more sense.

01-tiddler 02-wiki 03-tiddlywiki 04-filter 05-widget 06-macro 07-list 08-theme 09-procedure 10-template 11-tab 12-json 13-data 14-plugin 15-save 16-import 17-image 18-edition 19-link 20-field 21-tag 22-wikitext

CrossEye commented 10 months ago

Something else to add to the mix. In Fuzzy Search – use diff-match-patch or other algorithms, I discuss an alternative implementation I developed for a StackOverflow answer, using an extension of Dice's Coefficient to include consideration of the multiplicities of bigrams and not just their existence.

You can see running versions of the Levenshtein distance and my String similarity pseudo-metric running on the Ramda REPL. (I'm using Ramda here only for the sorting; we can remove it easily enough.) The Levenshtein code is pulled randomly from the web. The string similarity code is my own. It's written in modern JS and would need to be refactored to work in ES5; that shouldn't be a big deal.

You can comment/uncomment the various search strings in the two versions, or edit to add your own. I limited the output to the top 50, which you can change in the last lines. We could easily change to limit to a certain metric threshold.

I'm not sure if this offers anything helpful here, but it seems to be related.

leeoniya commented 10 months ago

likewise, i don't know if this will be helpful, but i've done pretty extensive comparisons of many fuzzy and fulltext search libs in the process of writing https://github.com/leeoniya/uFuzzy

Jermolene commented 10 months ago

Hi @leeoniya thank you, uFuzzy looks very useful, and glad to see that it is a success. It is small enough that we might consider including it in the TiddlyWiki core (the other alternative is to integrate it via a plugin).

A key criteria for inclusion of a feature in the core is "universality", which roughly means that a plausible case can be made that that feature is beneficial to all users of TiddlyWiki (regardless of language, culture, etc.). An opposing criterion is the desire not to make the core too complex, of course.

In the case of uFuzzy, that leads to a question: I'd be very interested to understand better is whether including it would be useful for, say, our Chinese speaking users? I'll tag @linonetwo and @oeyoews @BramChen @oflg who I think are our most active Chinese speaking contributors. If uFuzzy has the potential to be useful to Chinese speakers then I'd very much like to explore integrating it in the core.

oeyoews commented 10 months ago

uFuzzy uses regular expressions for search and matching, so it theoretically supports all languages. I did a small test and it seems good. It seems that it can be used in tw.

pmario commented 10 months ago

Hi @leeoniya thank you, uFuzzy looks very useful, and glad to see that it is a success. It is small enough that we might consider including it in the TiddlyWiki core (the other alternative is to integrate it via a plugin).

I think we do not need a new library, that is language dependent by default.

I think the existing diff-match-patch library already allows us to implement a fuzzy search as mentioned here at GH.

GH issue from @yaisog

I think it should be possible to use the match-demo code to implement a fuzzy search. @yaisog mentioned the "fuzzy location" parameter. But it is optional so setting it to 0 for our title search should be perfectly fine.

The only parameter that needs to be adjusted on a "per wiki / per language" basis may be the "match threshold", since it may depend on the language used in the wiki.

pmario commented 10 months ago

Also see

linonetwo commented 10 months ago

If there is a standard usage in the core to handle fuzzy search, I can patch it to support Chinese fuzzy search.

I already have a version in https://github.com/tiddly-gittly/tiddlywiki-plugins/tree/master/src/pinyin-fuzzy-search that addes pinyinfuse filter operator. Just don't have a way to apply to standard search in tw. It currently only being used in my command palette plugin.