cppalliance / safe-cpp

Boost Software License 1.0
151 stars 13 forks source link

Design: UTF string handling #1

Open vinniefalco opened 2 months ago

vinniefalco commented 2 months ago

Rust specifies that all constructed strings contain valid UTF. The Safe Standard Library design should reflect this in a way that preserves C++ values of bare-metal performance and pay-for-what-you-use to the greatest extent possible.

cdesouza-chromium commented 2 months ago

To summarise what has been discussed: Rust's safety model states that strings and string_views are complete over UTF code points, so you can't have UTF fragments. It actually does a UTF test on byte inputs before constructing a new string. This also requires the removal of operator[] as an accessor, to avoid indexing in the middle of a code point, and rather all access is done through iterators. To support subscripts, we should introduce char32_t.

The same check must occur with std2::string_constant, but at compile time.

vinniefalco commented 2 months ago

What does it mean to support subscripts? I thought operator[] was removed. An iterator over say, a UTF-8 string would need to have a UTF-32 value type I suppose.