aardappel / lobster

The Lobster Programming Language
http://strlen.com/lobster
2.21k stars 117 forks source link

indexing in strings with special charachters dosen't work proparly #267

Closed Hjagu09 closed 9 months ago

Hjagu09 commented 9 months ago

example

print "å"[0]
print string_to_unicode("å")

output

195
[229]

expected output

229
[229]

testing with more characters gives me this:

the same things happen with for loops

aardappel commented 9 months ago

I'm afraid this does work properly, as indexing works by byte, not by unicode character. Strings use a UTF-8 representation, so O(1) indexing would not be possible.

This is exactly the reason we have string_to_unicode: to turn it into a vector, which is indexable by unicode code point.

If you index a C++ std::string, you'll get the same result. Much like C++, a Lobster string does not promise its contents is UTF-8 (we use strings for abitrary binary buffers), only that if you store string data in it, it will be UTF-8.

Hjagu09 commented 9 months ago

Thank you, this issue can be closed