goblint / bench

The benchmark suite
4 stars 5 forks source link

zstd #16

Open michael-schwarz opened 2 years ago

michael-schwarz commented 2 years ago
time ./goblint ../../bench-repos/zstd --disable ana.base.context.non-ptr --disable sem.unknown_function.spawn --set ana.thread.domain plain --enable exp.earlyglobs --set ana.base.privatization none --enable dbg.print_dead_code --enable justcil  --disable include_stdlib --set cppflags[+] -DZSTD_NO_INTRINSICS -v &> zstd.cil.c

real    0m5,836s
user    0m5,495s
sys     0m0,308s

So parsing and everything is also non-horrible here. The program does contain threads.

With ./goblint zstd.cil.c --disable ana.base.context.non-ptr --disable sem.unknown_function.spawn --set ana.thread.domain plain --enable exp.earlyglobs --set ana.base.privatization none --enable dbg.print_dead_code &> zstd.cil.out -v

Summary for all memory locations:
    safe:         2388
    vulnerable:     14
    unsafe:         16
    -------------------
    total:        2418

vars = 170735    evals = 432968  

Timings:
TOTAL                          224.473 s
  parse                           0.544 s
  convert to CIL                  0.694 s
  analysis                       223.237 s
    global_inits                    1.085 s
    solving                        221.102 s
      S.Dom.equal                     0.712 s
      postsolver                     52.939 s
    warn_global                     0.131 s
Timing used
Memory statistics: total=296341.32MB, max=934.02MB, minor=296304.66MB, major=2087.03MB, promoted=2050.37MB
    minor collections=141306  major collections=28 compactions=0
Found dead code on 3427 lines (including 3087 in uncalled functions)!
Total lines (logical LoC): 21415
michael-schwarz commented 2 years ago

Ah, and it requires https://github.com/goblint/analyzer/pull/557 again

michael-schwarz commented 2 years ago

It works non-incrementally, but when I try to run it (on master) incrementally (without any changes to the code) I get

Fatal error: exception Invalid_argument("List.iter2")
Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45
Called from UpdateCil.update_ids.reset_fun in file "src/incremental/updateCil.ml", line 46, characters 4-70
Called from Stdlib__list.iter in file "list.ml", line 110, characters 12-15
Called from UpdateCil.update_ids in file "src/incremental/updateCil.ml", line 126, characters 2-43
Called from Maingoblint.diff_and_rename in file "src/maingoblint.ml", line 457, characters 22-92
Called from Maingoblint.main in file "src/maingoblint.ml", line 509, characters 110-130
Called from Stdlib.at_exit.new_exit in file "stdlib.ml", line 554, characters 59-63
Called from Stdlib.do_at_exit in file "stdlib.ml" (inlined), line 560, characters 20-61
Called from Std_exit in file "std_exit.ml", line 18, characters 8-20

@stilscher said she'd take a look. May be related to https://github.com/goblint/analyzer/issues/574

sim642 commented 2 years ago

The stacktrace looks very similar to the issue I accidentally happened upon indeed: https://github.com/goblint/analyzer/pull/542#issuecomment-1023077417. Although actually it doesn't stem from there being a different number of statements, but rather locals:

List.iter2 (fun l o_l -> l.vid <- o_l.vid) f.slocals old_f.slocals;
stilscher commented 2 years ago

We managed to get the incremental analysis to work with the additional option --set cppflags[+] -D_FORTIFY_SOURCE=0.

The problem were multiple functions with the name realpath in the CIL file. The comparison and updating of ids for the incremental analysis assume that the functions in the CIL file have unique names. The realpath definition can be found in the /usr/include/x86_64-linux-gnu/bits/stdlib.h header. The mentioned header is only included when the fortify level is greater than 0. We have not looked into, why multiple realpath functions exist in the CIL file. It might be related to the merging errors I found in #8. However, analyzing with the above mentioned flag could be a reasonable approach for now.

But there are also duplicate global variables in the CIL file at which one needs to look.

stilscher commented 2 years ago

A not too rigorous configuration that worked well for an analysis (took 5-6min for the non-incremental run) is:

./goblint ../test-repos/zstd/ --disable sem.unknown_function.spawn --enable exp.earlyglobs --set ana.base.privatization none --enable incremental.save --enable dbg.print_dead_code --set cppflags[+] -DZSTD_NO_INTRINSICS --set cppflags[+] -D_FORTIFY_SOURCE=0 &> zstd.cil.out -v

Additionally adding intervals led to a steadily increasing number of contexts and I aborted after ~30min. In the incremental run (without any changes) a big amount of time was needed for the postsolving. So it would be interesting to see how long the incremental analysis (without and with changes) takes when including the incremental postsolving from @sim642 that is currently on the interactive branch.

michael-schwarz commented 2 years ago

You might also want to use -DGOBLINT_NO_BSEARCH to remove some merging conflicts.

If we want to get the runtime a bit higher, we could also try:

michael-schwarz commented 2 years ago

Also, instead of deleting the contents of assert.h, it suffices to set -DGOBLINT_NO_ASSERT as an additional preprocessor flag.

michael-schwarz commented 2 years ago

After bumping goblint-cil, we now have zero CIL warnings when working with this repo :muscle:

jerhard commented 2 years ago

For the incremental run on zstd even on a program without any changes, the solving was reported to take around 3 seconds, out of ~8-12 seconds total. Which is noteworthy because ideally the solving should take ~0 seconds for unchanged code.

Yesterday we, @stilscher @michael-schwarz and I, looked further into this. It turns out that most of this incremental solving time, around >2,7 seconds, was spent on the hascons-relifting. If one disables hashconsing, the overall runtime for a non-incremental run is somewhat degraded, but not too much (roughly from 131 seconds to 138 seconds). There was some significant on memory usage though, IIRC. @stilscher has the precise numbers.

There is no easy fix to get rid of the relifting in the non-server incremental mode, as weak hashtables used for hashconsing cannot be marshaled (at least not directly). For the server mode, one could probably get away with skipping the relifting in the solver, but having to run experiments in server mode might make things more complicated.

One might just prefer to use the incremental analysis without hash-consing.

stilscher commented 2 years ago

Here are the numbers that we collected for a non-incremental run with hashconsing:

TOTAL                          131.208 s
  parse                           0.497 s
  convert to CIL                  0.683 s
  analysis                       130.029 s
    createCFG                       0.502 s
    global_inits                    0.671 s
    solving                        126.195 s
      vs                              0.001 s
      Sol'.solve                     125.932 s
        relift                          0.000 s
        S.Dom.equal                     0.414 s
        postsolver                     29.456 s
      split_solution                  0.263 s
    warn_global                     0.063 s
      access                          0.050 s
Timing used
Memory statistics: total=497409.75MB, max=1878.66MB, minor=497216.06MB, major=3929.75MB, promoted=3736.05MB
    minor collections=237121  major collections=28 compactions=0

and without hashconsing

TOTAL                          138.896 s
  parse                           0.535 s
  convert to CIL                  0.698 s
  analysis                       137.664 s
    createCFG                       0.499 s
    global_inits                    0.076 s
    solving                        129.479 s
      vs                              0.001 s
      Sol'.solve                     128.823 s
        relift                          0.000 s
        S.Dom.equal                     4.364 s
        postsolver                     41.818 s
      split_solution                  0.657 s
    warn_global                     0.086 s
      access                          0.068 s
Timing used
Memory statistics: total=486846.38MB, max=2484.53MB, minor=486616.93MB, major=4642.06MB, promoted=4412.62MB
    minor collections=232065  major collections=28 compactions=0
michael-schwarz commented 2 years ago

I tried adding "ZSTD_customMalloc","ZSTD_customCalloc","POOL_create" as malloc wrappers in a hope to get the number of race warnings down, but that did not help.

sim642 commented 2 years ago

If inlines merging is enabled again, aren't all the "is used for two distinct globals" warnings just due to the silly way zstd Makefile uses relative paths? If you read the warnings, they're all about mem.h, but referenced in different ways:

../lib/common/mem.h
../lib/legacy/../common/mem.h
../lib/dictBuilder/../common/mem.h
../lib/decompress/../common/mem.h
../lib/compress/../common/mem.h
sim642 commented 2 years ago

Yesterday I looked into the data races Goblint finds in zstd. Besides the valid data race on threadLimit as pointed out here, there's two kinds of spurious races we still find if we additionally activate symb_locks (but not var_eq!):

  1. In POOL_create_advanced there are accesses to a freshly allocated struct before the threads are created in a loop. A freshness/thread-locality/escape analysis should hopefully be able to rule these out.
  2. Accesses in functions like POOL_add_internal, which are called from functions that always call them with a lock held (POOL_add). Ideally we'd have a symbolic lock held there, but var_eq is required to preserve the symbolic lock from the outer function as the pointers get passed in via arguments to the inner function.

Unfortunately, also enabling var_eq causes the analysis to not terminate in at least 40 minutes:

runtime: 00:40:23.881
vars: 2497599, evals: 3116757
max updates: 32 for var node 4469 "ret = (FIO_prefs_t *)((FIO_prefs_t *)tmp);" on programs/fileio.c:229:5-229:71

|rho|=2497599
|called|=325
|stable|=2465598
|infl|=2497070
|wpoint|=232
|side_dep|=0
|side_infl|=0
Found 260457 contexts for 1193 functions. Top 5 functions:
24901   contexts for entry state of MEM_readST___17 (172569) on lib/common/mem.h:212:1-212:115
18684   contexts for entry state of MEM_isLittleEndian___18 (172565) on lib/common/mem.h:158:1-176:1
18583   contexts for entry state of MEM_read32___18 (172567) on lib/common/mem.h:210:1-210:110
17048   contexts for entry state of MEM_64bits___18 (172564) on lib/common/mem.h:156:1-156:95
11366   contexts for entry state of ZSTD_NbCommonBytes___4 (172783) on lib/compress/zstd_compress_internal.h:698:1-792:1

Memory statistics: total=5314342.41MB, max=7600.24MB, minor=5313863.33MB, major=40634.39MB, promoted=40155.32MB
    minor collections=2533967  major collections=70 compactions=0

With just symb_locks, it terminates in just 5 minutes:

runtime: 00:04:56.073
vars: 258037, evals: 797082
max updates: 57 for var node 2417 "flNb ++;" on programs/zstdcli.c:1249:9-1256:9

|rho|=258037
|called|=124
|stable|=236877
|infl|=257600
|wpoint|=1858
|side_dep|=0
|side_infl|=0
Found 12771 contexts for 1709 functions. Top 5 functions:
64  contexts for entry state of longCommandWArg (169760) on programs/zstdcli.c:428:1-434:1
58  contexts for entry state of FSE_readNCount_bmi2 (24917) on lib/common/entropy_common.c:234:1-245:1
58  contexts for entry state of FSE_readNCount_body (170219) on lib/common/entropy_common.c:69:1-215:1
58  contexts for entry state of FSE_readNCount_body_default (170220) on lib/common/entropy_common.c:218:1-223:1
57  contexts for entry state of BITv05_reloadDStream (175604) on lib/legacy/zstd_v05.c:857:1-884:1

Memory statistics: total=838726.42MB, max=1074.12MB, minor=838637.35MB, major=4296.13MB, promoted=4207.05MB
    minor collections=399930  major collections=37 compactions=0

Notably, var_eq causes an extreme number of contexts to appear compared to without it, so that's probably the source of non-termination. I haven't yet tried looking into what variable equalities end up collecting into those contexts.

sim642 commented 2 years ago

Notably, var_eq causes an extreme number of contexts to appear compared to without it, so that's probably the source of non-termination. I haven't yet tried looking into what variable equalities end up collecting into those contexts.

The issue is avoided using --set ana.ctx_insens[+] var_eq. Then the analysis terminates reasonably and the previously problematic inner accesses get the required symbolic lock.

sim642 commented 2 years ago

By the way, the bigger-benchmarks confs don't declare ZSTD_customMalloc and ZSTD_customCalloc as malloc wrappers, so all the allocations are joined into a single constraint unknown with a supertop value. So any pointers written to allocated structs are just forgotten and wouldn't get called later when dereferenced. This could have big implications for soundness...

sim642 commented 2 years ago

Using https://github.com/goblint/analyzer/pull/690, it's actually possible to get rid of all the spurious races now.

sim642 commented 2 years ago

After all the unsoundness saga and subsequent precision improvement attempt, it's revealed that zstd has some intricate race-freedom reasons, which Goblint is far from being able to handle: https://github.com/goblint/analyzer/pull/707#issuecomment-1111944059.

TimOrtel commented 1 year ago

I am getting this error when trying to run with efficiency.py: Fatal error: exception Failure("No suitable function to start from.") This is the config file i am using: https://github.com/TimOrtel/analyzer/blob/175cdf05c2f4209503bf02e81ae7a06259e89e73/conf/min_incr_zstd.json

Am I missing some kind of config option?

michael-schwarz commented 1 year ago

The most likely culprit would be that the compilation database is not generated correctly. Have you checked the contents of the compile_commands.json?

TimOrtel commented 1 year ago

Compile commands is empty: [] prepare.log has the following lines: ==> building with threading support ==> building zstd with .gz compression support ==> no liblzma, building zstd without .xz/.lzma support ==> no liblz4, building zstd without .lz4 support LINK obj/conf_....../zstd zstd build completed

sim642 commented 1 year ago

You probably need to do make clean and recreate the compilation database. If make has everything cached and does nothing, then the compilation database will not contain anything.

TimOrtel commented 1 year ago

This cannot be the case, because the script efficency.py clones zstd from git on every execution. Therefore make is executed every time on a fresh zstd folder.

michael-schwarz commented 1 year ago

I think @stilscher is the one best equipped to help you here, she did the benchmarking for zstd for our last submission.

stilscher commented 1 year ago

I was not able to reconstruct the problem. I ran python3 efficiency.py /absolute/path/to/the/analyzer 1 on the branch on your fork that contains the scripts (I fixed the make error by substituting Seq.exists in detectRenamedFunctionsRecursive.ml because it does not exist in the included module). prepare.log does not contain any error messages, a compilation database is created and a main method found and analyzed. Which compiledb version are you using?

TimOrtel commented 1 year ago

Perhaps I am using the wrong compiledb. I installed mine through pip. The version is 0.10.1. Is there another compiledb tool?

stilscher commented 1 year ago

Not that I know of. I am using the same version, also installed through pip.

TimOrtel commented 1 year ago

Are you sure you are on the correct branch? I dont get this make error. Perhaps we also have differen opam switch configurations?