goblint / analyzer

Static analysis framework for C
https://goblint.in.tum.de
MIT License
160 stars 72 forks source link

Proposal: Unify handling of malloc and escaping locals #1491

Open michael-schwarz opened 1 month ago

michael-schwarz commented 1 month ago

Currently, malloc blobs are syntactical globals, i.e., they become shared between threads upon their creation. However, additional such variables can be created after having entered multi-threaded mode. Maybe it would be a good idea to not make them syntactic globals, but use some mechanism to escape them as soon as their address is leaked into something that is a global (and emit escape events)?

This would allow unifying the handling of escaping locals and malloc blobs somewhat. It would mean that there is only flavor of introducing new global variables into the global invariant to handle instead of two. Hopefully, one such mechanism would be less fragile than what we currently have.

This may also have some precision benefits in cases where a malloc is thread-local and the block never needs to be escaped, as it can be handled flow-sensitively in this case, (and updates may sometimes even be destructive).

The challenges are:

Maybe it also time for us to rethink all this Escape, Malloc, ... business, provide sound theoretical foundations for it, and write it up somewhere. I don't think there is much existing work on this, but it is clearly of relevance for thread-modular analyses of globals.

sim642 commented 1 month ago

This may also have some precision benefits in cases where a malloc is thread-local and the block never needs to be escaped, as it can be handled flow-sensitively in this case, (and updates may sometimes even be destructive).

This indeed is a value-based variation of what mallocFresh tried to do just for races. Since we couldn't correctly nail that down, I suspect it will only be more involved for value analysis (considering how tricky protection-based value-analysis turned out to be compared to protection-based race analysis).

Standard recency analysis might be the way to deal with non-uniqueness, but I don't know a way to easily implement it in our framework. In an allocation loop, the second iteration must turn recent things into non-recent (non-unique) things, so all analyses that use varinfos must know to do a deep substitution before the second allocation at a site. But it might already have propagated into global unknowns that cannot just be substituted like that.