Open gbonnema opened 7 years ago
This is a big one. We should be careful. Is there any reason as far as you can see why they would do this? It makes no sense... why would they write the functions this way in the first place, and how did it persist this long?
No mention in the documentation as far as I saw.
I checked StackOverflow through google and found that there sometimes is a reason, which my (admittedly unclear) story about EOF being an integer with a negative value. The standard does not further specify the value. This means that if the value is too large for an unsigned char, you have to use an int to compare with EOF. However, it also specifies, that all characters (not being EOF) should fit in an unsigned char.
In the past all this was not specified as clear as it is specified now (C11). So, what we have here is old and not consistent. We need to find a way to make it consistent, without hurting functionality. But the comparison of a signed char with EOF is definitely very dangerous and we need to find a way around it.
Anyways, it all needs careful consideration before we start changing. Maybe change step by step.
Let's keep a checklist here of files where we need to fix this. I'm still wary-- even if the specification wasn't as clear, it was still good practice to avoid this kind of thing. Also, it makes me suspect there might be a whole lot more of surprises where this came from.
I wonder if this is related to the mystery memory leak.
Honestly, I don't think this is the cause of a memory leak. But ... it could be the cause of false negatives. When you use a signed char to store an int of which the value is larger that a signed can contain, the compare of the char with an int will turn up negative, even though the received value could have been equal.
EDIT: I suspect that issue #30 identifies the cause (if there is only one cause). EDIT2: No it is not.
I think the only remedy is to stick with the C11 spec and use unsigned char for everything except when you expect an int from system functions like fgetc that could be an unknown negative value like EOF. By Linux man page: fgetc returns an unsigned char cast to an int, or EOF on an end of file error. That is why we should take care in handling the response of fgetc and use an int to receive the response and only recast to unsigned char after handling the response.
The problem in our programs is that ints and chars are mixed. For instance we have a struct where the item is defined as int, but is always assigned a character like 'a', 'b', etc.
I can list the programs where this happens. At the moment I think solving the warnings I get from gcc has a higher priority, because these warnings could obscure errors.
The splint warning is:
The program contains a mishmash of int and char, even signed char to all refer to char. This issue deserves further investigation, because the constants the characters are compared to are not always within the range of a signed char. They might be within the range of an unsigned char.
The other issue is that the C standard (the WG14 draft, I couldn't afford the real specs) specifies "EOF is an int which expands to an integer constant expression with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream". Comparing a signed char to EOF is an invitation to trouble and depends (heavily) on implementation.
The problem is that the usage of char and int for char is mixed. Also, the type doesn't reflect the intention of the data. I need some time to figure this one out because it covers almost all of the functions.