Closed mapron closed 2 years ago
This "bug" needs to be looked at more comprehensively. I would be very grateful if you could help me to figure it out. Both of these methods do not operate on single iterators, but take any appropriate arguments in their entirety. When we pass in a string or a vector everything works as expected, but in the case of raw strings we have a quirk, we mean null terminator at end. Should we consider this terminal sign or not? The STL concept says no. For example:
string_view str = "abcd"; // size of string is 4, not 5
Here example that can illustrate the problem we met if current behavior will change:
constexpr auto raw_str[] = u8"abcd";
assert(code_point_count(raw_str) == 5); // ok
const string stl_str = raw_str;
assert(code_point_count(stl_str) == 4); // ??
In current implementation number of code points is 4 as expected in both cases. My point of view is that it is better to take the path of less surprise and follow the standard. If someone wants to take into account the zero terminal sign at the end of the string, then let him to use the version with iterators and do it explicitly:
constexpr auto raw_str[] = u8"abcd";
assert(code_point_count(std:::cbegin(raw_str), std:::cend(raw_str)) == 5);
Ok, I will look deeper on this at the weekend and run some tests. Can not give immediate answer out of my head.
Sorry for the late answer, I had bad couple days last weekend. Now to the point, add these lines to you test file:
I run some tests, you were damn right. std::cend for char[] literals pointing behind 0 terminator, not at it. Shame for me working with C++ for so long and discovering it just now.
Well, I guess I do not use raw literals too often :)
Sorry for bothering your code is consistnet with std::size and std::distance calls!
I wanna talk about code_unit_count/code_point_count functions. (as I started to dive into your library, estimating the length was point of interest for me).
Then I found couple of
and felt bad about it, so I can have unexpected result when passing iterator pair by myself (STL taught me to use end behind the end, while you code expects different meaning).