goblint / analyzer

Static analysis framework for C
https://goblint.in.tum.de
MIT License
184 stars 75 forks source link

Unsound for MemorySafety for some library functions #1263

Open michael-schwarz opened 11 months ago

michael-schwarz commented 11 months ago

We currently warn about accesses to stdin for code such as this

int main() {
    char *tmp;

    char inputBuffer[14U] = {(char)'\000'};

    tmp = fgets((char *)(& inputBuffer),50,stdin); //NB! Too many bytes read
}

[Warning][Behavior > Undefined > Other] Pointer stdin contains an unknown address. Invalid dereference may occur (2.c:11:5-11:50) [Warning][Unknown] Pointer stdin has a points-to-set of top. An invalid memory access might occur (2.c:11:5-11:50) [Warning][Behavior > Undefined > MemoryOutOfBoundsAccess][CWE-823] Size of pointer stdin is top. Memory out-of-bounds access might occur due to pointer arithmetic (2.c:11:5-11:50)

I think it does not make sense to produce such warnings for stdin, but instead we would need to wran for the invalid writes to inputBuffer.

mrstanb commented 11 months ago

I think this could be configured to hold for many variables besides stdin, such as stdout, stderr, etc., right?

sim642 commented 11 months ago

Doing it by the FILE type would cover all these and more. Access does so for ignoring races within these structures that should be treated as opaque.

michael-schwarz commented 11 months ago

Actually, we should hold off on doing this for now, as these warnings currently save us from being unsound in many cases:

#include<stdio.h>
int main() {
    char *tmp;

    char inputBuffer[14U] = {(char)'\000'};

    tmp = fgets((char *)(& inputBuffer),50,stdin); //Reading too many chars
    free(tmp2);
}

This is a typical example of a buffer-overflow, and we currently do not produce warnings for it. We should address this after SV-COMP, potentially with specifying constraints for correct usage in the library function mechanism.

sim642 commented 11 months ago

potentially with specifying constraints for correct usage in the library function mechanism.

Since such buffer overflow and file descriptor related things can grow quite complex, we might want to start thinking about not reinventing the wheel and reusing more general library function specifications. For example those used by Mopsa or Frama-C (ACSL).