castwide / solargraph

A Ruby language server.
https://solargraph.org
MIT License
1.87k stars 154 forks source link

fix: offset error for unicode #620

Closed SolaWing closed 1 year ago

SolaWing commented 1 year ago

Character have different length in different standards, and will affect how offset calculated

for example, for a string "𐐀":

String#length returns 1 (utf32 character length, no matter which encoding used) RubyVM::AbstractSyntaxTree.parse("𐐀").children[2].last_column return 4 (utf8 bytes size) LSP Spec default use utf16, which length is 2. ( 4 bytes size)

if the standard does not match, offset will be wrong and get wrong result.

Fortunately, most char utf16 and utf32 char lengths are the same, and rarely cause problems. but In the unicode environment, the byte length and character length are generally different, and will report error

Because Ruby String handle char instead of bytes. so the offset all should be char length based

Since AbstractSyntaxTree::Node is passed to many place and hard to control, the simplest repair is to wrap it when create and return a character based column

this should fixes #539 and maybe other unicode related offset errors

if LSP client support UTF32, notify it use utf-32 position capabilities.positionEncoding, then all position should be same. But same client may be only support default utf-16 position...

castwide commented 1 year ago

Thanks!