goblint / cil

C Intermediate Language
https://goblint.github.io/cil/
Other
136 stars 20 forks source link

`Cabs2cil.environment` contains duplicate `vid`s #31

Closed sim642 closed 3 years ago

sim642 commented 3 years ago

I started looking at https://github.com/goblint/analyzer/issues/195 and immediately found a witness generation issue that was unrelated to my HoareMap.

Running the following generates witness.graphml where invariants use the variable name __builtin_bswap32 instead of i:

./goblint --conf conf/svcomp21.json --sets ana.specification ./tests/sv-comp/unreach-call-__VERIFIER_error.prp ./tests/sv-comp/basic/for_true-unreach-call.c

Bisecting the issue leads me to this: https://github.com/goblint/analyzer/pull/169, but as far as I can see, that has nothing to do with the problem, which seems to be somewhat nondeterministic instead.

The variable names in witness generation undergo a replacement step, which replaces CIL's renamed stuff with their original values (https://github.com/goblint/analyzer/blob/419679cebdc9e667c8821783b988da05958176bc/src/domains/invariantCil.ml). This is based on #17.

If I do the following in Goblint to print out Cabs2cil.environment:

    let l = Hashtbl.fold (fun original_name (envdata, _) acc ->
        match envdata with
        | Cabs2cil.EnvVar vi ->
          (original_name, vi) :: acc
        | _ -> acc
      ) Cabs2cil.environment []
    in
    let l = List.stable_sort (fun (_, {vid=lvid; _}) (_, {vid=rvid; _}) -> lvid - rvid) l in
    List.iter (fun (original_name, vi) ->
        ignore (Pretty.printf "vi.vid = %d, original_name = %s, vi.vname = %s\n" vi.vid original_name vi.vname)
      ) l;

Then I find out that the mapping contains varinfos with different names but duplicate vids and that's the cause for the wrong variable name in the witness:

vi.vid = 318, original_name = __builtin_bswap32, vi.vname = __builtin_bswap32
vi.vid = 318, original_name = i, vi.vname = i
sim642 commented 3 years ago

This seems to be related to merging files, because witness generation/SV-COMP mode of Goblint also analyzes includes/sv-comp.c. Both seem to add the builtin functions again in CIL, but with somehow overlapping ID ranges.

If I also enable custom_libc, then that analyzes includes/lib.c as a third file, then there will be three copies of each builtin function.

sim642 commented 3 years ago

Alright, I've figured out what exactly is happening:

  1. First file is parsed by CIL, its variables get IDs [0, A] and entries are added to Cabs2cil.environment with those IDs.
  2. Second file is parsed by CIL, its variables get IDs [A+1, B] and entries are added to Cabs2cil.environment with those IDs.
  3. The two files are merged by CIL and some variables (e.g. builtins, standard library etc) are unified. But this process doesn't cause any overlaps, but just leaves variables with some IDs unused in the program, because they were replaced with the identical variable with another ID. I think it may also introduce some new variables (conflicting static definitions), but those get IDs [B+1, C] and entries are added to Cabs2cil.environment with those IDs (at least I think).
  4. Then we call Partial.globally_unique_vids, which renumbers every used variable ID into [0, X]. (Even though we're about to remove Partial, Goblint will have a copy of this simple renumbering functionality: https://github.com/goblint/analyzer/pull/194/files.)
  5. Since the renumbering happens through a mutable record field, the varinfos are physically equal to some of the EnvVar data inside Cabs2cil.environment, which through this secret connection is automagically updated to match the new IDs, even though we never go back to fix it to be consistent!

But the copies of varinfo that were ignored for constructing the merged file were not renumbered and their data inside Cabs2cil.environment still has their original IDs in [0, C]. And some of those might just happen to overlap with the renumbered ones in [0, X], which causes the duplication.

I'm surprised that this issue only showed up now, because AFAIK it has been possible ever since #17 exposed the internal mapping and also in our svcomp21 build. It requires some extraordinary coincidence though: the order of things in a hashtable must happen to be such that the duplicates happen to be the wrong way around exactly for the variable relevant for a witness invariant.

I think the easiest solution is to just not do Partial.globally_unique_vids or the like.

michael-schwarz commented 3 years ago

I think the easiest solution is to just not do Partial.globally_unique_vids or the like.

This had me worried a bit, since we do rely on having unique vids in a couple of places. But then it seems like this enforced by CIL anyway by assigning increasing ids. So it seems like Partial.globally_unique_vids only ensures that the vids are consequent, which is not something we care about.

The only place where we would run into issues I guess is in loadBinaryFile, but I don't think we use this in Goblint.

https://github.com/goblint/cil/blob/7181c1396efa8c423ba7ec8a946c7f3043ca8cfb/src/cil.ml#L5219-L5233

So I guess it is safe to remove globally_unique_vids altogether.

sim642 commented 3 years ago

This had me worried a bit, since we do rely on having unique vids in a couple of places. But then it seems like this enforced by CIL anyway by assigning increasing ids. So it seems like Partial.globally_unique_vids only ensures that the vids are consequent, which is not something we care about.

As far as I can see, this should be the case as long as all varinfos are appropriately created by the CIL functions. Goblint git history shows that Partial.globally_unique_vids has been there since the beginning, so no additional context about why it was ever necessary.

The only place where we would run into issues I guess is in loadBinaryFile, but I don't think we use this in Goblint.

I also noticed this when I was looking through how CIL generates the vids, but I don't understand what's special about the number 11. It may be some ancient hard-coded relic, because CIL adds a lot (hundreds) of builtin function declarations. And globally_unique_vids would renumber things independently of nextGlobalVID, so it doesn't seem related anyway.

sim642 commented 3 years ago

I'm just thinking now that maybe Cabs2cil.environment should say somewhere that it exposes just the mapping for initially CIL-generated varinfos and their numbering.

Also #17 added and exposed Cabs2cil.varnameMapping, which somehow also seems to be related to CIL renaming variables. But it's entirely based on variable names as strings and holds no connection to varinfos or their vids, so I thought it'd be safer to avoid trying to undo the renaming just on names (which don't necessarily have to be unique, right?) and use the more low-level environment instead. What's the intended connection between these things though?

michael-schwarz commented 3 years ago

It is indeed superfluous and I removed it in #32.