goblint / analyzer

Static analysis framework for C
https://goblint.in.tum.de
MIT License
184 stars 75 forks source link

Reconsider notion of path-sensitive but context-insensitive analysis #1509

Open michael-schwarz opened 5 months ago

michael-schwarz commented 5 months ago

For #1508 I was playing around with various configurations to check that they still work, and it seems that Goblint allows to have an analysis that is path-sensitive, but not context-sensitive.

However, it seems unclear what that is supposed to do: Currently, the effect is that while paths are kept separate, they are then all propagated to callees. This seems unnecessary, it should be possible to remember (hashes) of the original paths and then only combine values from compatible paths.

At first glance, it might also make sense to only allow path-sensitivity for context-sensitive analyses: However, the information that is kept by the two different sensitivities is (potentially) different, which is in itself again dubious: If I have path-sensitivity active, would it not make sense to also consider this path-sensitive information part of the context.

I think this merits a principled re-design, and it should also be documented what these settings do.

sim642 commented 5 months ago

However, it seems unclear what that is supposed to do: Currently, the effect is that while paths are kept separate, they are then all propagated to callees. This seems unnecessary, it should be possible to remember (hashes) of the original paths and then only combine values from compatible paths.

The current behavior (https://github.com/goblint/analyzer/blob/60ecfe7eb23fcdd927eed52da45c9da755dc7fa3/src/framework/constraints.ml#L802-L810) should be that all paths are propagated from the caller to the callee separately and the result of each is combined with the corresponding caller state. That's why enter returns a list of pairs (caller state, callee state) and combine is applied pairwise. paths_as_set (added during the longjmp work) is also used to do this kind of splitting somewhere else.

In our constraint-based view, the two sensitivities have aspects that the other cannot replace:

  1. Path-sensitivity (as a set of paths) could be extended to do more semantic joining and widening between paths. Our current approach has been that they're all disjoint sublattices but they don't have to be. It's just more complicated to define the suitable (cofibered) domain. This kind of approach could avoid some of the recently discovered termination issues from blowup of constraint unknowns from contexts that we have no way to generalize over (there's no global view of all of the contexts to reason about).
  2. Context-sensitivity (as component of constraint unknown) provides more fine-grained dependency structure in the solver. But as it has been so far, there's no general and systematic way to prevent explosion of contexts without considering specific analyses (and combinations thereof) that lead to such things. This is an assumption of our solver termination proofs (finitely many unknowns are queried) but something we have no way of guaranteeing right now. Also, this cannot support arbitrary path splits (e.g. failing locking) in demand-driven solving.
michael-schwarz commented 5 months ago

Yes, I agree that there are subtle differences between both features and that we should have both, but I still think that the interaction between them is perhaps not particularly well thought-out yet.

Consider, e.g., the following program:

// PARAM: --set ana.ctx_insens[+] threadflag --set ana.ctx_insens[+] threadid --set ana.ctx_insens[+] base --set ana.path_sens[+] threadflag
// Fully context-insensitive, but thread is path-sensitive
#include <goblint.h>
#include <pthread.h>

int foo(int arg) {
  return arg;
}

void *t_fun(void *arg) {
  int r = foo(2);
  __goblint_check(r == 2); // Currently fails in one path and succeeds in the other
}

int main() {
  int r = foo(1);
  __goblint_check(r == 1); // Currently fails in one path and succeeds in the other

  pthread_t id;
  pthread_create(&id, NULL, t_fun, NULL);

  return 0;
}

I think a more intuitive way for path-sensitivity to work across call-boundaries (that still should be relatively easy to implement and almost for free):

This would be how I would path-sensitivity in an interprocedural setting to work intuitively: If the different paths happen to be kept apart inside the procedure, we get some notion of context-sensitivity in the path information for free. Otherwise, we don't, but at least we only get propagated those paths into which our path was potentially joined instead of getting all of them.

I'm not sure if I am expressing myself clearly here, if not, please tell me and I might try to write up something more formal.