elixir-lang / elixir-lang.github.com

Website for Elixir
elixir-lang.org
355 stars 824 forks source link

Fix incorrect byte_size example #1737

Closed thisistonydang closed 1 year ago

thisistonydang commented 1 year ago

The example in Getting Started - Binaries, strings, and charlists shows byte_size("héllo") returning 6, but it should return 7 since the character é is three bytes.

Current snippet:

iex> string = "héllo"
"héllo"
iex> String.length(string)
5
iex> byte_size(string)
6

Proposed change:

iex> string = "héllo"
"héllo"
iex> String.length(string)
5
iex> byte_size(string)
7 # <- should return 7 here
josevalim commented 1 year ago

It depends. "é" can be written in two bytes as a single character "é" or three bytes as the character "e" followed by the accent. Which operating system and terminal are you using?

thisistonydang commented 1 year ago

Oh, I didn't know that! I'm using the default Terminal on Mac with zsh.

Screenshot 2023-10-25 at 15 53 42

josevalim commented 1 year ago

Interesting. So maybe it is your keyboard settings? What is your default keyboard layout? Or more precisely, how are you typing the é?

thisistonydang commented 1 year ago

I'm not typing the é. I'm copying and pasting it directly from the example in the Getting Started guide.

thisistonydang commented 1 year ago

I just tried typing the é with my keyboard instead and it returned as two bytes! So looks like it's definitely something to do with the copy & pasting!

Edit: Another interesting data point - copy&pasting the é from Chrome and Safari results in 3 bytes whereas copying&pasting from Firefox results in 2 bytes.

thisistonydang commented 1 year ago

Hmm, to avoid confusion like this, what do you think about changing the example to use hellö instead of héllo? I did not run into the same problem when copy&pasting hellö.

josevalim commented 1 year ago

Interesting. In my browser, "héllo" has six bytes on Firefox but 7 on Safari, so Safari is the one normalizing it! I wonder if this is recent, because otherwise we would have heard about it sooner.

However, we are currently moving the docs to Elixir: https://hexdocs.pm/elixir/main/binaries-strings-and-charlists.html - and in this page, it has 6 on both browsers. So it is definitely a combination of both. By the end of next week latest, I will update all links to point to the new docs, so this will be resolved then.

Btw, unfortunately "hellö" could have the same issue, so if the issue comes back, we would need to pick something else altogether, such as emoji.

thisistonydang commented 1 year ago

Nice, I just checked the new docs and can confirm copy&pasting results in 6 bytes using Chrome, Firefox, and Safari. I'll go ahead and close this PR.

Thanks for taking the time to look at this small issue. This is my first time looking at Elixir so I'm glad the community is very responsive!