Closed thisistonydang closed 1 year ago
It depends. "é" can be written in two bytes as a single character "é" or three bytes as the character "e" followed by the accent. Which operating system and terminal are you using?
Oh, I didn't know that! I'm using the default Terminal on Mac with zsh.
Interesting. So maybe it is your keyboard settings? What is your default keyboard layout? Or more precisely, how are you typing the é
?
I'm not typing the é
. I'm copying and pasting it directly from the example in the Getting Started guide.
I just tried typing the é
with my keyboard instead and it returned as two bytes! So looks like it's definitely something to do with the copy & pasting!
Edit: Another interesting data point - copy&pasting the é
from Chrome and Safari results in 3 bytes whereas copying&pasting from Firefox results in 2 bytes.
Hmm, to avoid confusion like this, what do you think about changing the example to use hellö
instead of héllo
? I did not run into the same problem when copy&pasting hellö
.
Interesting. In my browser, "héllo" has six bytes on Firefox but 7 on Safari, so Safari is the one normalizing it! I wonder if this is recent, because otherwise we would have heard about it sooner.
However, we are currently moving the docs to Elixir: https://hexdocs.pm/elixir/main/binaries-strings-and-charlists.html - and in this page, it has 6 on both browsers. So it is definitely a combination of both. By the end of next week latest, I will update all links to point to the new docs, so this will be resolved then.
Btw, unfortunately "hellö" could have the same issue, so if the issue comes back, we would need to pick something else altogether, such as emoji.
Nice, I just checked the new docs and can confirm copy&pasting results in 6 bytes using Chrome, Firefox, and Safari. I'll go ahead and close this PR.
Thanks for taking the time to look at this small issue. This is my first time looking at Elixir so I'm glad the community is very responsive!
The example in Getting Started - Binaries, strings, and charlists shows
byte_size("héllo")
returning6
, but it should return7
since the characteré
is three bytes.Current snippet:
Proposed change: