Vonr / align.nvim

A minimal plugin for NeoVim for aligning lines
GNU General Public License v3.0
147 stars 4 forks source link

Does not work with umlauts like äöü #15

Closed rafo closed 8 months ago

rafo commented 1 year ago

It doesn't seem to work with umlauts (and possibly other special characters):

Try:

aVjja     =1;
fsoaao=2;
longs=1;
c=2
läääüüüööökslkasjdlncljksdaksjdlak =22

to align for "=". Result:

aVjja                                       =1;
fsoaao                                      =2;
longs                                       =1;
c                                           =2
läääüüüööökslkasjdlncljksdaksjdlak =22

It works without äääüüüööö:

aVjja                     =1;
fsoaao                    =2;
longs                     =1;
c                         =2
lkslkasjdlncljksdaksjdlak =22
Vonr commented 9 months ago

This looks like an issue with unicode characters not necessarily being the same in visual length, I don't think Lua has any good tools to solve this without reaching for luarocks, which I aim to avoid.

Unfortunately, this issue will probably remain unsolved for a while.

rafo commented 9 months ago

https://www.lua.org/manual/5.3/manual.html#6.5 looks like its possible to get the number of (UTF-8) chars for each line utf8.len (s [, i [, j]]) Shouldn't it be easy to calculate dst at pos to insert src? I hope its not naive, but your hint for the "visual length" implies to me that some internal vim representations of "a string in a line" instead of lua string calculations is going on... And I did not read the whole code. I am more the python guy and didnt code in lua).

Vonr commented 9 months ago

That's true for UTF-8, unfortunately not everything is UTF-8.

rafo commented 9 months ago

But most are, and UTF-8 includes all characters from other character sets. Even Vim recommends in the Vim documentation, configuring Vim to use UTF-8 internally:

The most popular one is UTF-8, which uses one or more bytes for each character and is backwards compatible with ASCII. On MS-Windows UTF-16 is also used (previously UCS-2), which uses 16-bit words. Vim can support all of these encodings, but always uses UTF-8 internally.

https://vimdoc.sourceforge.net/htmldoc/mbyte.html#Unicode

Vim will treat the file as, for example, Latin-1 only when the file is unambiguously Latin-1. A file containing only 7-bit ASCII codes is valid Latin1, but it's also valid UTF-8. Such a file will normally be treated as UTF-8 in VIM.

Maybe a easy solution could be to check if set encoding=utf-8?

rafo commented 9 months ago

If your current locale is in utf-8] encoding, Vim will automatically start in utf-8 mode.

Vonr commented 9 months ago

Also for the record, LuaJIT is on Lua 5.1, so utf8 doesn't even exist there.

ronisbr commented 8 months ago

Hi!

Why not use a module like https://github.com/starwing/luautf8 to compute the string text width instead of counting how many characters? It should work flawlessly with and without UTF-8.

Vonr commented 8 months ago

There is no standardized way to package Luarocks for Neovim plugins. For example, packer.nvim has a way, but lazy.nvim does not.

ronisbr commented 8 months ago

What about copying the entire plugin inside align.nvim?

ronisbr commented 8 months ago

Like this one here: https://github.com/Stepets/utf8.lua

ronisbr commented 8 months ago

After analyzing the alternatives, I think we just need two small plugins to implement all the required functionality:

https://github.com/uga-rosa/utf8.nvim https://github.com/aperezdc/lua-wcwidth

What do you think about just copying those plugins inside this one?

echasnovski commented 8 months ago

This looks like an issue with unicode characters not necessarily being the same in visual length, I don't think Lua has any good tools to solve this without reaching for luarocks, which I aim to avoid.

Unfortunately, this issue will probably remain unsolved for a while.

Neovim's Lua doesn't have any good tools yet (only vim.str_utfindex(), vim.str_byteindex(), and some others), but Neovim's Vimscript does. Like :h strchars() and :h strcharpart().

Vonr commented 8 months ago

I'm happy with that solution, I think I've got a working fix according to manual tests.

Vonr commented 8 months ago

Nevermind, still quite a complex fix. I'm currently using strdisplaywidth via vim.fn if anyone's curious.

rafo commented 8 months ago

Wow! Thanx!