goblint / cil

C Intermediate Language
https://goblint.github.io/cil/
Other
133 stars 20 forks source link

Fix syntactic search #147

Closed stilscher closed 1 year ago

stilscher commented 1 year ago

Fixes #146.

The syntactic search looks up the corresponding (possibly differently named) varinfo and location in the environment and checks whether it is within the function currently in focus. The check whether the location is within the function only considers the starting line number. Also, as the end of the function the starting line number of the "next" function is taken. There can however be several different files, so the line number is not a sufficiently precise representation of the location. Also the start line number of the next function might be in a different file and therefore does not correspond with the actual end line number of the function (apart from this I am not sure if any reasonable ordering of the functions in the list can even be assumed).

In the example from the issue, this leads to function main reaching from line 4-5 (5 is the start line of function pthread_once), and so no variable use of fail is found within.

Changes:

This also fixes the issue https://github.com/goblint/gobview/issues/7#issue-1162653541. The expression of the semantic query is only evaluated on the results from the syntactic analysis, so if no variable uses are found in the syntactic search, the semantic search will also look broken. I could not find an example where the semantic analysis is still broken (apart from what is fixed in https://github.com/goblint/analyzer/pull/1092#issue-1765349856).

Related PRs: https://github.com/goblint/analyzer/pull/1092#issue-1765349856, https://github.com/goblint/gobview/pull/25#issue-1765365288

michael-schwarz commented 1 year ago

Goblint-CIL aims to support OCaml >= 4.05, so things such as List.find_map that are more recent cannot be used here.

michael-schwarz commented 1 year ago

This also fixes goblint/gobview#7 (comment).

I am extremely skeptical that this indeed fixes that issue. We had identified that it was due to prune pruning the entire hashtable because of some hash issue. This seems unrelated to the issue you are fixing here.

michael-schwarz commented 1 year ago

Ok, I am sure this is a stupid question, but here it comes: Why rely on something so brittle as line numbers here at all?

stilscher commented 1 year ago

I am extremely skeptical that this indeed fixes that issue. We had identified that it was due to prune pruning the entire hashtable because of some hash issue. This seems unrelated to the issue you are fixing here.

I did some more investigation: The test case from the semantic search issue actually works in GobView also without the proposed changes from this (and the related) PRs. My fixes are necessary only for other test cases (such as the one described in #146, also with semantic queries) that contain pseudo-return nodes or where the computation with the line numbers is unlucky. I am however not able to reconstruct the false behavior of GobView anymore (no matter which of the current versions), where no results are found for the semantic query. This is confusing, because I am convinced that I did observe this unexpected behavior earlier.

stilscher commented 1 year ago

Ok, I am sure this is a stupid question, but here it comes: Why rely on something so brittle as line numbers here at all?

CIL renames duplicate variable names in its so-called alpha-conversion. These renamings are stored in a hash table that maps variable names from the c source code to (possibly several) tuples of varinfo (which possibly has a different name) and location (where the varinfo with the new name was created). The function find_uses_in_fun_var tries to find all variable occurrences in a specific function body and for this looks up all (across the complete project) possible renamings of the searched variable in the hashtable. Because those renamings are not restricted to occurrences in the relevant functions, they are filtered with this location comparison.

Here is an example:

1 void f(int i) {
2   int i = 0; // renamed to i___0
3   int i___1 = 0;
4   i++;
5   return;
5 }

int main() {
  int i = 0;
  int i = 0; // renamed to i___0;
  int i = 1; // renamed to i___1
  f(i);
  return 0;
}

Querying for variables of name i in the body of function named f should lead to finding occurrences in lines 2 and 4. Without the line number check, the additional renaming i -> i___1 in main is retrieved from the map and the additional occurrence in line 3 is wrongfully found.

I was wondering if one could avoid this check by searching for the varinfo, instead of just its name, but I think this would need some more extensive refactoring.

michael-schwarz commented 1 year ago

Thanks for the clarification! I feel like I knew this at some point, but have forgotten it since then.

michael-schwarz commented 1 year ago

@stilscher what is blocking this?