Reconsider notion of path-sensitive but context-insensitive analysis

michael-schwarz commented 5 months ago

For #1508 I was playing around with various configurations to check that they still work, and it seems that Goblint allows to have an analysis that is path-sensitive, but not context-sensitive.

However, it seems unclear what that is supposed to do: Currently, the effect is that while paths are kept separate, they are then all propagated to callees. This seems unnecessary, it should be possible to remember (hashes) of the original paths and then only combine values from compatible paths.

At first glance, it might also make sense to only allow path-sensitivity for context-sensitive analyses: However, the information that is kept by the two different sensitivities is (potentially) different, which is in itself again dubious: If I have path-sensitivity active, would it not make sense to also consider this path-sensitive information part of the context.

I think this merits a principled re-design, and it should also be documented what these settings do.

sim642 commented 5 months ago

However, it seems unclear what that is supposed to do: Currently, the effect is that while paths are kept separate, they are then all propagated to callees. This seems unnecessary, it should be possible to remember (hashes) of the original paths and then only combine values from compatible paths.

The current behavior (https://github.com/goblint/analyzer/blob/60ecfe7eb23fcdd927eed52da45c9da755dc7fa3/src/framework/constraints.ml#L802-L810) should be that all paths are propagated from the caller to the callee separately and the result of each is combined with the corresponding caller state. That's why enter returns a list of pairs (caller state, callee state) and combine is applied pairwise. paths_as_set (added during the longjmp work) is also used to do this kind of splitting somewhere else.

In our constraint-based view, the two sensitivities have aspects that the other cannot replace:

Path-sensitivity (as a set of paths) could be extended to do more semantic joining and widening between paths. Our current approach has been that they're all disjoint sublattices but they don't have to be. It's just more complicated to define the suitable (cofibered) domain. This kind of approach could avoid some of the recently discovered termination issues from blowup of constraint unknowns from contexts that we have no way to generalize over (there's no global view of all of the contexts to reason about).
Context-sensitivity (as component of constraint unknown) provides more fine-grained dependency structure in the solver. But as it has been so far, there's no general and systematic way to prevent explosion of contexts without considering specific analyses (and combinations thereof) that lead to such things. This is an assumption of our solver termination proofs (finitely many unknowns are queried) but something we have no way of guaranteeing right now. Also, this cannot support arbitrary path splits (e.g. failing locking) in demand-driven solving.

michael-schwarz commented 5 months ago

Yes, I agree that there are subtle differences between both features and that we should have both, but I still think that the interaction between them is perhaps not particularly well thought-out yet.

Consider, e.g., the following program:

// PARAM: --set ana.ctx_insens[+] threadflag --set ana.ctx_insens[+] threadid --set ana.ctx_insens[+] base --set ana.path_sens[+] threadflag
// Fully context-insensitive, but thread is path-sensitive
#include <goblint.h>
#include <pthread.h>

int foo(int arg) {
  return arg;
}

void *t_fun(void *arg) {
  int r = foo(2);
  __goblint_check(r == 2); // Currently fails in one path and succeeds in the other
}

int main() {
  int r = foo(1);
  __goblint_check(r == 1); // Currently fails in one path and succeeds in the other

  pthread_t id;
  pthread_create(&id, NULL, t_fun, NULL);

  return 0;
}

Because of path-sensitivity, the information inside foo is kept separate all the time, but then both paths propagate to both callers.
For this analysis this can of course be remedied by making threadflag context-sensitive.
However, this is not always possible as:
- the context- and path-sensitivity keep track of potentially different things, and
- (imagine a more complicated lattice) keeping things separate based on contexts at procedure start could be expensive as they are forcibly kept apart based on the start state throughout the procedure, which may be a meaningless distinction based on how the values evolve later.

I think a more intuitive way for path-sensitivity to work across call-boundaries (that still should be relatively easy to implement and almost for free):

When calling enter, tag the path with some path id: $$i \in \mathbb{N}_0$$
Inside the procedure, maintain for each path a set of possible origin paths: $$I \subseteq \mathbb{N}_0$$ This set should get joined as usual when the path-sensitivity decides to join things, and should not play a role in this decision what to keep separate
Inside combine, return $$\bot$$ if the path-id used to enter the procedure is not included in the set of origins at the return point.

This would be how I would path-sensitivity in an interprocedural setting to work intuitively: If the different paths happen to be kept apart inside the procedure, we get some notion of context-sensitivity in the path information for free. Otherwise, we don't, but at least we only get propagated those paths into which our path was potentially joined instead of getting all of them.

I'm not sure if I am expressing myself clearly here, if not, please tell me and I might try to write up something more formal.

goblint / analyzer

Reconsider notion of path-sensitive but context-insensitive analysis #1509