9fans / acme-lsp

Language Server Protocol tools for the acme text editor
MIT License
194 stars 25 forks source link

Correctly handle UTF-16 offsets #7

Open fhs opened 5 years ago

fhs commented 5 years ago

LSP uses UTF-16 offsets:

A position inside a document (see Position definition below) is expressed as a zero-based line and character offset. The offsets are based on a UTF-16 string representation. So a string of the form a𐐀b the character offset of the character a is 0, the character offset of 𐐀 is 1 and the character offset of b is 3 since 𐐀 is represented using two code units in UTF-16.

Acme uses rune offsets. Currently, we treat the UTF-16 offsets as rune offsets (and vice versa) for an easier implementation, which is obviously wrong.