Closed mulle-nat closed 4 months ago
ISO/IEC 9899:1999 (E) 7.24.6.3.2 "The mbrtowc function" seems unclear on this.
It would surprise me if people implemented it that way, as it contradicts expectations for e.g. strncmp()
or strncpy()
(i.e. your YMM example is AFAICT invalid for strncmp()
, but maybe not?).
Fixing this isn't hard, though it will introduce a bit of complexity and nastiness in this very core function.
Let's take a look at the common implementations.
You are right. If interpreted as akin to strncmp
, it shouldn't be a problem. But because you can also give it invalid multibyte sequences and its supposed to return error values, I was under the impression n
is more a buffer length kinda parameter. I hardly ever use wchar_t
, so apologies, if this is a red herring.
no, i appreciate filing the bug. i am not certain of my interpretation, and have wondered about this before (it also came up in a bug filed by @zhiayang a few weeks ago). we might end up naturally enforcing this as part of handling that request.
here's an interesting link https://developers.redhat.com/articles/2022/09/17/gccs-new-fortification-level#2__better_fortification_coverage
One example is
wcrtomb
, where glibc makes stronger assumptions about the object size passed than POSIX allowed. Specifically, glibc assumes that the buffer passed towcrtomb
is always at leastMB_CUR_MAX
bytes long. In contrast, the POSIX description makes no such assumption. Due to this discrepancy, any application that passed a smaller buffer would potentially makewcrtomb
overflow the buffer during conversion. Then the fortified version__wcrtomb_chk
aborts with a buffer overflow, expecting a buffer that isMB_CUR_MAX
bytes long. We fixed this bug in glibc-2.36 by making glibc conform to POSIX .
I read this as that mbrtowc(&wc, gcluster, MB_LEN_MAX, &mbt);
used to be fine, but now is wrong or ?
I read this as that
mbrtowc(&wc, gcluster, MB_LEN_MAX, &mbt);
used to be fine, but now is wrong or ?
nah, it was an error in the implementation of wcrtomb()
in glibc. just an interesting datum.
i think the Standard is fundamentally ambiguous here. i'd still like to go look at implementations, though that guarantees nothing about the future (your YMM example is very much the kind of thing i'm worried about).
egcpool:
MB_LEN_MAX
on my system is 0x10, but the inputgcluster
string length could very well be less. I believembrtowc
could justifyably copy allMB_LEN_MAX
bytes into some XMM register or so and then this might then fault, if the string is at the end of a page.