Dhghomon / rust-fsharp

Rust - F# - Rust reference
MIT License
239 stars 14 forks source link

Rust chars are not UTF-8 #11

Closed ChayimFriedman2 closed 3 years ago

ChayimFriedman2 commented 3 years ago

Rust char in F# is a char (.NET Char). Rust char is UTF-8, while in F# they are UTF-16.

Rust char is UTF-32 (that's not specified, although it is specified that it should be 4 bytes wide):

Representation

char is always four bytes in size. This is a different representation than a given character would have as part of a String...

https://doc.rust-lang.org/std/primitive.char.html#representation

OTOH, Rust strings (String and str) are UTF-8 encoded, and actually represented with Vec<u8> (https://doc.rust-lang.org/src/alloc/string.rs.html#279-281).

Dhghomon commented 3 years ago

Oh yeah, silly me. I do have a note on String being Vec<u8> though.