9fans / plan9port

Plan 9 from User Space
https://9fans.github.io/plan9port/
Other
1.63k stars 319 forks source link

all: deviations from correct standard C in plan9port and required corresponding compiler flags #313

Open nsajko opened 4 years ago

nsajko commented 4 years ago

Plan9port has a whole lot of bugs that stem from the convoluted history of C before and during standardization. Those should all be fixed (I hope nobody disagrees), which will take time.

GCC and Clang provide a few compiler flags for choosing nonstandard variations/dialects of C, so in the mean time we can apply the workaround of adding the needed compiler flags.

-fno-strict-aliasing should for some time be passed by 9c to GCC and Clang to disable strict enforcing of standard C aliasing rules. Maybe most programmers do not know about the "strict aliasing" rules mandated by the C standard and the compilers. Quoting the GCC manual (GCC is more lax than the standard):

In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an "unsigned int" can alias an "int", but not a "void*" or a "double". A character type may alias any other type.

So filling variables of one type using pointers to another type, a quite common pattern, is usually incorrect. See the following links for elaboration:

I think this is an example from src/lib9/_p9dialparse.c https://github.com/9fans/plan9port/blob/master/src/lib9/_p9dialparse.c#L39

Another compiler option that should probably be turned on by 9c for some time is -fno-strict-overflow. This is similar to the "strict aliasing" situation in that most programmers do not know that signed integers do not wrap around on overflow, rather signed integer overflow is a (programmer) error. Formally it is a class of "undefined behavior", meaning that the compiler (by default) assumes signed integer variables WILL NEVER overflow, and in a situation where a misguided programmer might expect his program to overflow a certain variable, literally anything can happen instead.

This seems to be an example in the hash function in src/cmd/mk/symtab.c: https://github.com/9fans/plan9port/blob/master/src/cmd/mk/symtab.c#L29 . The fix is simple: use an unsigned type for wraparound overflowing.

EDIT: actually, I think we do not need -fno-strict-overflow, as undefined overflow situations are actually easy to catch at runtime using ubsan.

The following compiler flag is currently passed by 9c to GCC and Clang: -fsigned-char. In standard C it is unspecified if the char type is signed or unsigned, thus the programmer must use either unsigned char or signed char if the signedness matters. The flag makes char be signed. I am mentioning the flag here just so people would know (or would not forget) about the places in the code that depend on it, and thus need work. Or, alternatively, if there is no such code left, the flag could maybe be removed?

nsajko commented 4 years ago

0c9c620f39e56c42802504003fd05664aba670a4 fixed some aliasing issues, but left a "quick-fix" on https://github.com/9fans/plan9port/blob/master/src/libmemdraw/draw.c#L2376

nsajko commented 4 years ago

Another class of issues is creating out-of-bounds pointers, using them, or using out-of-bounds array indices; eg. in various implementations of getcallerpc. Yes, you heard right, just creating an out of bounds pointer is undefined behavior.

Related: I remember there was some code which accessed struct elements like they were part of an array. This is incorrect on multiple fronts: it is an out-of-bounds access, and it assumes a particular (null) padding between structure fields.

pete commented 4 years ago

-fno-strict-aliasing should for some time be passed by 9c to GCC and Clang to disable strict enforcing of standard C aliasing rules.

It may be worth noting that a lot of the code in question was authored by the people that created C and that the standard goes off the rails on occasion. The authors of the Linux kernel have no plans to eliminate the kernel's use of -fno-strict-aliasing and it uses -fno-strict-overflow liberally. It's a pain and the aliasing rules are kind of absurd.

https://github.com/9fans/plan9port/blob/master/src/lib9/_p9dialparse.c#L39

This looks like the code is being used exactly as intended for BSD sockets. A sockaddr_storage pointer is designed to be dereferenced after casting to either a sockaddr_in or a sockaddr_in6. This does not validate any aliasing rules. Either it's fine or you have a bug in your libc.

most programmers do not know that signed integers do not wrap around on overflow, rather signed integer overflow is a (programmer) error.

I don't think there exist many ones' complement machines any more. gcc doesn't support ones' complement, though I am not sure whether clang does or not.

Formally it is a class of "undefined behavior", meaning that the compiler (by default) assumes signed integer variables WILL NEVER overflow,

The compiler is free to do so, per the standard. A sane compiler targeting a twos' complement machine should assume twos' complement representation. In the case of gcc, only twos' complement is supported, though for some reason you have to pass a flag to make the compiler assume twos' complement.

and in a situation where a misguided programmer might expect his program to overflow a certain variable, literally anything can happen instead.

It is unfortunate that this "literally anything can happen" line gets repeated so often. UB is specified in the standard to allow compiler writers to make assumptions and to warn programmers not to make certain assumptions, as they may not hold. In this case, you can tell the compiler to change its assumptions. "Literally anything can happen" is a bogeyman: UB means that the compiler may assume that either P or !P holds, or it may decline to make an assumption. If it does anything besides that, you should probably uninstall the compiler.

It's probably a better idea to use "-fwrapv" (which instructs the compiler to assume wrapping) than "-fno-strict-overflow" (which instructs the compiler to make no assumptions about wrapping), but one of the two should probably be used until someone makes a machine that isn't twos' complement and gcc starts targeting that architecture and an OS that runs P9P gets ported to it and someone wants to run P9P on it.

The flag makes char be signed.

The code assumes a signed char, the flag allows that assumption.

nsajko commented 4 years ago

@pete If you want to discuss C, the standards, or the compilers, I would be glad to do that, but using another channel (preferably email). The reason is that those discussions could get long, but are not quite pertinent to this issue (unless the p9p maintainers think otherwise).

Some things I will say in response to your seeming urging to disregard the standard and use compiler flags for selecting nonstandard dialects:

  1. Right now the compiler flags are not even used by 9c. EDIT: I meant -fno-strict-aliasing

  2. Being more compliant to the standards (actually, GCC and Clang) as opposed to using some ad-hoc dialect of C means being friendlier to possible contributors AND it enables the compilers to generate more efficient code (in the case of strict aliasing and strict signed overflow).

  3. Keep in mind that the compilers' authors are in the best position to reckon about the semantics of C and the way some C code needs to be written to get certain desired machine code. They know what is more and what is less hard to optimize.

As an example, signed overflow being an error enables certain important optimizations. And writing code with that in mind is easy.

I think the biggest problem with C today is that (probably because of its convoluted history) it is being taught wrong, leading to it being used wrong (and then the user gets pissed at the standards committees and compilers' writers).

nsajko commented 4 years ago

Consider type T, which is either less wide than int, or signed, or both less wide than int and signed. Usually it is unsigned char.

There is a class of bugs where there is a left shift of a T-typed value. The cause of the bug is implicit promotion to integer (if T is less wide than int) and the fact that signed integer overflow is an error. In other words, the compiler could assume that the shift expression will never overflow, which was not the programmer's intention.

I think that most of those bugs will be fixed by https://github.com/9fans/plan9port/pull/319 , but some will surely remain; so here is a list of some lines of code with suspicious left shifts:

src/libdraw/getsubfont.c:                       v = ((src[x/pack] << ((x%pack)*f->bits->depth)) >> (8 - f->bits->depth)) & mask;

src/libmp/port/mpvecdigmuladd.c:        x = p1<<(Dbits/2);
src/libmp/port/mpvecdigmuladd.c:        x = p2<<(Dbits/2);

src/libmemdraw/fillpoly.c:              onehalf = 1 << (fixshift-1);
src/libmemdraw/fillpoly.c:              onehalf = 1 << (fixshift-1);

src/cmd/venti/srv/bloom.c:              z = 1<<(x&31);
src/cmd/venti/srv/bloom.c:              if(!(tab[(x&b->bitmask)>>5] & (1<<(x&31))))

src/cmd/bzip2/lib/blocksort.c:#define       SET_BH(zz)  bhtab[(zz) >> 5] |= (1 << ((zz) & 31))
src/cmd/bzip2/lib/blocksort.c:#define     CLEAR_BH(zz)  bhtab[(zz) >> 5] &= ~(1 << ((zz) & 31))
src/cmd/bzip2/lib/blocksort.c:#define     ISSET_BH(zz)  (bhtab[(zz) >> 5] & (1 << ((zz) & 31)))
src/cmd/bzip2/lib/compress.c:   s->bsBuff |= (v << (32 - s->bsLive - n));

src/cmd/upas/bayes/dfa.c:       u[(r%4096)/32] |= 1<<(r%32);

src/libmach/dwarfget.c: if(v&(1<<(nb-1)))

src/libauthsrv/passtokey.c:                     key[i] = (t[i] >> i) + (t[i+1] << (8 - (i+1)));
src/libauthsrv/opasstokey.c:            key[n] = (t[n] >> n) + (t[n+1] << (8 - (n+1)));