mbyte character support - Githubissues

RRethy / vim-illuminate

illuminate.vim - (Neo)Vim plugin for automatically highlighting other uses of the word under the cursor using either LSP, Tree-sitter, or regex matching.

2.16k stars 47 forks source link

mbyte character support #48

Closed suliveevil closed 4 years ago

suliveevil commented 4 years ago

Would you please add mbyte character support? Highlight all the same single character in range [\u4e00-\u9fa5] is all I need. Thank you very much.

RRethy commented 4 years ago

I'm not sure what you mean. Can you provide an example and what you expect to be highlighted based on your cursor position.

suliveevil commented 4 years ago

Treat character that in range of [\u4e00-\u9fa5] as a word. Then we can achieve

Highlight all the same single character in range [\u4e00-\u9fa5]

treat chinese character as a word

suliveevil commented 4 years ago

Chinese is different from English because they don't have delimiters between words.

So we can treat the tiniest textobj of Chinese i.e. character in range [\u4e00-\u9fa5] as a word.

Then we can highlight the same pseudo word as usual.

suliveevil commented 4 years ago

If we have a this match pattern , we can even do more on it: use text segmentation to highlight other textobj.

RRethy commented 4 years ago

use text segmentation to highlight other textobj.

What does this mean?

As for matching specific unicode characters as whole words, vim-illuminate uses \k which is controlled by :h 'iskeyword' which doesn't support unicode characters AFAIK. vim-illuminate loosely highlights the same thing matched by the motion iw (try viw on your text and you will see what it highlights). To add support for matching specific unicode characters I would likely add an option to match against a regex pattern along the lines of [a-z] but with unicode characters. However, I'm not sure about adding this, I'm going to think about this for a couple of days.

suliveevil commented 4 years ago

Text segmentation should be done by tools like jieba or other natural language grammar checkers.

RRethy commented 4 years ago

Text segmentation should be done by tools like jieba or other natural language grammar checkers.

For the record I was confused by textobj (not the first part of your previous sentence) because it means something very different than simply the tokens in a document, for example vim-illuminate to an extent highlights the text object iw, while there are no text objects for single unicode characters.

suliveevil commented 4 years ago

You are right. Thank you very much. I realized that I should make a custom textobj to cooperate with vim-illuminate, that's Vim way. I will learn to write plugins.