domschrei / krunner-symbols

A lightweight KRunner plugin (Plasma 5) to retrieve unicode symbols, or any other string, based on a corresponding keyword.
GNU General Public License v3.0
118 stars 12 forks source link

full unicode support #4

Closed Thomqa closed 6 years ago

Thomqa commented 7 years ago

with aliasses and search patterns in the middle of words should also match.

domschrei commented 7 years ago

I like the idea of supporting unicode, and I see the following challenges:

Any thoughts?

domschrei commented 7 years ago

I found the official files in CSV-like structure. There is also a convenient, dictionary-like index file which might be really useful. I'll look into it.

domschrei commented 7 years ago

I created a branch with initial unicode support. It works with a glossary-like text file from unicode.org. These are not all unicode symbols, but it seems to be a reasonable collection of symbols with meaningful short descriptions.

The plugin loads the entire glossary into memory when launched (< 1 MB) and then can search the entered term (non-fuzzy, case-insensitive, also matching substrings) and returns the found symbols. The priority of each of the results is something that needs to be implemented yet (a floating-point number between 0.0 and 1.0 which should be higher the likely this is the result you were looking for). Also, the symbol definitions in krunner-symbolsrc need to be cut accordingly, so that there are no duplicates.

Thomqa commented 7 years ago

Nice! I will install it this weekend to test it.

domschrei commented 7 years ago

As of now, the plugin (inside the unicode branch) actually supports the entire Unicode database (i.e. it knows all definitions inside the official UnicodeData.txt file). The performance seems okay to me. I also implemented an advanced heuristic to sort the results from most to least relevant (though it might need some additional tweaking).

domschrei commented 7 years ago

The features have been merged into the master branch. Unicode support is disabled by default for now, but it can be enabled by a config setting (see the updated README). On that occasion, I have implemented a proper "cascading" configuration, where local definitions / settings will override global ones.

I'm not happy with the heuristic of relevance for the unicode symbols yet; I hope I can improve this soon.

domschrei commented 6 years ago

As mentioned in the change notes, the current release v1.0.4 now features a much better search and rank algorithm.