ericniebler / range-v3

Range library for C++14/17/20, basis for C++20's std::ranges
Other
4.05k stars 437 forks source link

UB in std::views::split #1810

Open OptimumCpp opened 4 months ago

OptimumCpp commented 4 months ago

This must actually become a std DR; but since you're the godfather of std::ranges, I bring my complaining to you. This is the original post pointing out the problem.

haleyk10198 commented 2 months ago

it's NAD. author was actually searching with "\0" inclusively.

on that note, i'd agree it's a surprise to developers and debatable whether we should have a specialization of std:;ranges::end for char arrays, but would require some expertise to verify if it breaks any other STL algorithms

OptimumCpp commented 2 months ago

With all due respect, it is a defect. Raw string literals have traditionally been treated as null-terminated strings. Thus, I argue that current behavior creates confusion and ambiguity. The case of raw arrays for character types needs to be explicitly deleted. We can use std::span for covering the last element; but a better solution would be to have a UDL for std::array of character types(char, wchar_t...). So, if someone intends to use a UTF8 character array, he could write u8"UTF8 array"arr.

در تاریخ جمعه ۱۲ آوریل ۲۰۲۴،‏ ۰۷:۲۴ Haley Kwok @.***> نوشت:

it's NAD. author was actually searching with "\0" inclusively.

— Reply to this email directly, view it on GitHub https://github.com/ericniebler/range-v3/issues/1810#issuecomment-2050920346, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADY5PPTVITUMIPFRL65DTYDY45LGPAVCNFSM6AAAAABDBEX57SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJQHEZDAMZUGY . You are receiving this because you authored the thread.Message ID: @.***>

OptimumCpp commented 2 months ago

std::to_array works too.

در تاریخ جمعه ۱۲ آوریل ۲۰۲۴،‏ ۱۶:۴۵ Farid Mehrabi @.***> نوشت:

With all due respect, it is a defect. Raw string literals have traditionally been treated as null-terminated strings. Thus, I argue that current behavior creates confusion and ambiguity. The case of raw arrays for character types needs to be explicitly deleted. We can use std::span for covering the last element; but a better solution would be to have a UDL for std::array of character types(char, wchar_t...). So, if someone intends to use a UTF8 character array, he could write u8"UTF8 array"arr.

در تاریخ جمعه ۱۲ آوریل ۲۰۲۴،‏ ۰۷:۲۴ Haley Kwok @.***> نوشت:

it's NAD. author was actually searching with "\0" inclusively.

— Reply to this email directly, view it on GitHub https://github.com/ericniebler/range-v3/issues/1810#issuecomment-2050920346, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADY5PPTVITUMIPFRL65DTYDY45LGPAVCNFSM6AAAAABDBEX57SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJQHEZDAMZUGY . You are receiving this because you authored the thread.Message ID: @.***>

haleyk10198 commented 2 months ago

range-v3 has ranges::view::c_str for developers to express the context in null-terminated string, which is effectively std::string_views. code works as trivially intended when the pattern is wrapped by them.

To remove ambiguity you'd need to delete std::ranges::end for char[]. doing that will break and surprise a lot of generic code that is working with T[ ]. It will be hard to convince ISO to change the convention a breaking way.

OptimumCpp commented 2 months ago

No. Changing old adapters is not the solution. A concept for all character types is needed, and the concept std::ranges::range needs to exclude character arrays. Just range=WHATEVER and not (is_array_v<T> and character<remove_extent_t<T>>);

is modern and should be tighter than legacy code. `std::end` is legacy, and we should leave it as is. در تاریخ جمعه ۱۲ آوریل ۲۰۲۴،‏ ۱۸:۳۱ Haley Kwok ***@***.***> نوشت: > range-v3 has ranges::view::c_str for developers to express the context in > null-terminated string, which is effectively std::string_views. code > works as trivially intended when the pattern is wrapped by them. > > To remove ambiguity you'd need to delete std::ranges::end for char[]. > doing that will break and surprise a lot of generic code that is working > with T[ ]. It will be hard to convince ISO to change the convention a > breaking way. > > — > Reply to this email directly, view it on GitHub > , > or unsubscribe > > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
haleyk10198 commented 2 months ago

That will do no good but break a jaw dropping amount of code.

A better place for these changes should be compiler warnings, that would still keep developers away from surprises without breaking anything.

OptimumCpp commented 2 months ago
is a relativity new feature, with a lot of unfair sorrounding skepticism. I doubt that broken code is as widespread as you mentioned. Because it only affects a subset of . Compiler warning are not a bad choice; but making them smart to cope with recommended library constructs is not appealing; Library and core should act distinctively. If `static_assert` could take an `int` as its 3rd argument, we could pass it warning level values with some special value for absolute error. It would allow customizing library behavior; but again, modifying `concept ranges::range` is the most generic change, with minimum effort. در تاریخ شنبه ۱۳ آوریل ۲۰۲۴،‏ ۰۴:۴۸ Haley Kwok ***@***.***> نوشت: > That will do no good but break a jaw dropping amount of code. > > A better place for these changes should be compiler warnings, that would > still keep developers away from surprises without breaking anything. > > — > Reply to this email directly, view it on GitHub > , > or unsubscribe > > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >