Character have different length in different standards, and will affect how offset calculated
for example, for a string "𐐀":
String#length returns 1 (utf32 character length, no matter which encoding used) RubyVM::AbstractSyntaxTree.parse("𐐀").children[2].last_column return 4 (utf8 bytes size)
LSP Spec default use utf16, which length is 2. ( 4 bytes size)
if the standard does not match, offset will be wrong and get wrong result.
Fortunately, most char utf16 and utf32 char lengths are the same, and rarely cause problems. but In the unicode environment, the byte length and character length are generally different, and will report error
Because Ruby String handle char instead of bytes. so the offset all should be char length based
Since AbstractSyntaxTree::Node is passed to many place and hard to control, the simplest repair is to wrap it when create and return a character based column
this should fixes #539 and maybe other unicode related offset errors
if LSP client support UTF32, notify it use utf-32 position capabilities.positionEncoding, then all position should be same.
But same client may be only support default utf-16 position...
Character have different length in different standards, and will affect how offset calculated
for example, for a string "𐐀":
String#length
returns 1 (utf32 character length, no matter which encoding used)RubyVM::AbstractSyntaxTree.parse("𐐀").children[2].last_column
return 4 (utf8 bytes size) LSP Spec default use utf16, which length is 2. ( 4 bytes size)if the standard does not match, offset will be wrong and get wrong result.
Fortunately, most char utf16 and utf32 char lengths are the same, and rarely cause problems. but In the unicode environment, the byte length and character length are generally different, and will report error
Because Ruby String handle char instead of bytes. so the offset all should be char length based
Since AbstractSyntaxTree::Node is passed to many place and hard to control, the simplest repair is to wrap it when create and return a character based column
this should fixes #539 and maybe other unicode related offset errors
if LSP client support UTF32, notify it use utf-32 position capabilities.positionEncoding, then all position should be same. But same client may be only support default utf-16 position...