dankamongmen / notcurses

blingful character graphics/TUI library. definitely not curses.
https://nick-black.com/dankwiki/index.php/Notcurses
Other
3.4k stars 113 forks source link

Potential memory access overflow in utf8_egc_len #2762

Closed mulle-nat closed 4 months ago

mulle-nat commented 4 months ago

egcpool:

static inline int
utf8_egc_len(const char* gcluster, int* colcount){
  size_t ret = 0;
  *colcount = 0;
  int r;
  mbstate_t mbt;
  memset(&mbt, 0, sizeof(mbt));
  wchar_t wc, prevw = 0;
  bool injoin = false;
  do{
    r = mbrtowc(&wc, gcluster, MB_LEN_MAX, &mbt);

MB_LEN_MAX on my system is 0x10, but the input gcluster string length could very well be less. I believe mbrtowc could justifyably copy all MB_LEN_MAX bytes into some XMM register or so and then this might then fault, if the string is at the end of a page.

dankamongmen commented 4 months ago

ISO/IEC 9899:1999 (E) 7.24.6.3.2 "The mbrtowc function" seems unclear on this.

It would surprise me if people implemented it that way, as it contradicts expectations for e.g. strncmp() or strncpy() (i.e. your YMM example is AFAICT invalid for strncmp(), but maybe not?).

Fixing this isn't hard, though it will introduce a bit of complexity and nastiness in this very core function.

Let's take a look at the common implementations.

mulle-nat commented 4 months ago

You are right. If interpreted as akin to strncmp, it shouldn't be a problem. But because you can also give it invalid multibyte sequences and its supposed to return error values, I was under the impression n is more a buffer length kinda parameter. I hardly ever use wchar_t, so apologies, if this is a red herring.

dankamongmen commented 4 months ago

no, i appreciate filing the bug. i am not certain of my interpretation, and have wondered about this before (it also came up in a bug filed by @zhiayang a few weeks ago). we might end up naturally enforcing this as part of handling that request.

dankamongmen commented 4 months ago

here's an interesting link https://developers.redhat.com/articles/2022/09/17/gccs-new-fortification-level#2__better_fortification_coverage

One example is wcrtomb, where glibc makes stronger assumptions about the object size passed than POSIX allowed. Specifically, glibc assumes that the buffer passed to wcrtomb is always at least MB_CUR_MAX bytes long. In contrast, the POSIX description makes no such assumption. Due to this discrepancy, any application that passed a smaller buffer would potentially make wcrtomb overflow the buffer during conversion. Then the fortified version __wcrtomb_chk aborts with a buffer overflow, expecting a buffer that is MB_CUR_MAX bytes long. We fixed this bug in glibc-2.36 by making glibc conform to POSIX .

mulle-nat commented 4 months ago

I read this as that mbrtowc(&wc, gcluster, MB_LEN_MAX, &mbt); used to be fine, but now is wrong or ?

dankamongmen commented 4 months ago

I read this as that mbrtowc(&wc, gcluster, MB_LEN_MAX, &mbt); used to be fine, but now is wrong or ?

nah, it was an error in the implementation of wcrtomb() in glibc. just an interesting datum.

i think the Standard is fundamentally ambiguous here. i'd still like to go look at implementations, though that guarantees nothing about the future (your YMM example is very much the kind of thing i'm worried about).