goblint / bench

The benchmark suite
4 stars 5 forks source link

OpenSSL #7

Open sim642 opened 2 years ago

sim642 commented 2 years ago

Attempt 1

./Configure
bear -- make
goblint -v .

Error:

Frontc is parsing /home/simmo/Desktop/openssl/.goblint/preprocessed/engines/e_afalg.i
/usr/include/stdlib.h[140:8-16] : syntax error
Parsing errorFatal error: exception Frontc.ParseError("Parse error")

The problem is _Float32 here:

# 140 "/usr/include/stdlib.h" 3 4
extern _Float32 strtof32 (const char *__restrict __nptr,
     char **__restrict __endptr)
     __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (1)));

In engines/e_afalg.c:

/* Required for vmsplice */
#ifndef _GNU_SOURCE
# define _GNU_SOURCE
#endif

_GNU_SOURCE enables (through a number of intermediates) the following in the unpreprocessed stdlib.h:

#if __HAVE_FLOAT32 && __GLIBC_USE (IEC_60559_TYPES_EXT)
extern _Float32 strtof32 (const char *__restrict __nptr,
              char **__restrict __endptr)
     __THROW __nonnull ((1));
#endif

I think OpenSSL itself doesn't even use _Float32 or that function, but it just gets pulled in with _GNU_SOURCE among tons of other GNU-specific things.

Attempt 2

In floatn-common.h there is the following typedef:

#  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
typedef float _Float32;
#  endif

So in an extremely crude hack to bypass that and cause the typedef to be used in the standard headers, I tried the following to pretend to be GCC 6.0 for the purposes of preprocessing:

goblint -v . --set cppflags[+] '-D__GNUC__=6' --set cppflags[+] '-D__GNUC_MINOR__=0'

Error:

Frontc is parsing /home/simmo/Desktop/openssl/.goblint/preprocessed/engines/e_afalg.i
include/openssl/crypto.h[413:0-0] : syntax error
Parsing errorFatal error: exception Frontc.ParseError("Parse error")

The problem is the _Noreturn here:

# 413 "include/openssl/crypto.h"
_Noreturn void OPENSSL_die(const char *assertion, const char *file, int line);

_Noreturn is a C11 feature that CIL doesn't support (https://github.com/goblint/cil/issues/13).

Attempt 3

In an even more desperate attempt, I tried to just preprocess that keyword away (it's just extra information that technically shouldn't matter):

goblint -v . --set cppflags[+] '-D__GNUC__=6' --set cppflags[+] '-D__GNUC_MINOR__=0' --set cppflags[+] '-D_Noreturn='

Error:

/usr/include/x86_64-linux-gnu/bits/floatn.h:86: Error: Invalid combination of type specifiers
Error on A.TYPEDEF (Errormsg.Error)
Frontc is parsing /home/simmo/Desktop/openssl/.goblint/preprocessed/engines/e_capi.i
Error: There were parsing errors in /home/simmo/Desktop/openssl/.goblint/preprocessed/engines/e_capi.i
Fatal error: exception Errormsg.Error

The problematic code is __float128:

/* The type _Float128 exists only since GCC 7.0.  */
#  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
typedef __float128 _Float128;
#  endif

CIL has has some kind of support for it (https://github.com/goblint/cil/issues/8), but it defines both __float128 and _Float128 as builtin type names, so maybe that's screwing with the parsing of this. I think that aspect of CIL has been updated for GCC >=7.0, so my version hack above is probably causing this one to happen. The appropriate solution would be to handle the other GNU float types directly as well.

TODO

michael-schwarz commented 2 years ago

I can patch the _Noreturn thing, it shouldn't require much work. I guess at the same time I'll also cherry pick all C11 things from https://github.com/goblint/cil/pull/24 that do no require AST changes or programmatic support in Goblint to be sound.

michael-schwarz commented 2 years ago

Does https://github.com/goblint/cil/pull/60 resolve this?

michael-schwarz commented 2 years ago

After https://github.com/goblint/cil/pull/60 the issue now is _Atomic:

/usr/lib/gcc/x86_64-linux-gnu/7/include/stdatomic.h[40:9-16] : syntax error
Parsing errorFatal error: exception Frontc.ParseError("Parse error")
Raised at Errormsg.parse_error in file "src/ocamlutil/errormsg.ml", line 328, characters 2-27
Called from Stdlib__parsing.yyparse.loop in file "parsing.ml", line 151, characters 8-44
Called from Stdlib__parsing.yyparse in file "parsing.ml", line 164, characters 4-28
Re-raised at Stdlib__parsing.yyparse in file "parsing.ml", line 183, characters 8-17
Called from Stats.repeattime.repeatf in file "src/ocamlutil/stats.ml", line 106, characters 10-15
Re-raised at Stats.repeattime.repeatf in file "src/ocamlutil/stats.ml", line 110, characters 1-8
Called from Frontc.parse_to_cabs_inner in file "src/frontc/frontc.ml", line 189, characters 15-90
Re-raised at Frontc.parse_to_cabs_inner in file "src/frontc/frontc.ml", line 205, characters 6-73
Called from Frontc.parse_to_cabs in file "src/frontc/frontc.ml", line 121, characters 13-38
Called from Frontc.parse_helper in file "src/frontc/frontc.ml", line 256, characters 13-32
typedef _Atomic _Bool atomic_bool;
typedef _Atomic char atomic_char;
typedef _Atomic signed char atomic_schar;
typedef _Atomic unsigned char atomic_uchar;
typedef _Atomic short atomic_short;
typedef _Atomic unsigned short atomic_ushort;
typedef _Atomic int atomic_int;
michael-schwarz commented 2 years ago

With https://github.com/goblint/cil/pull/61 there are some warnings during the merging phase, but CFG generation succeeds and solving starts.

michael-schwarz commented 2 years ago
Summary for all memory locations:
        safe:           47
        vulnerable:    117
        unsafe:         26
        -------------------
        total:         190

vars = 7941    evals = 185490  

Timings:
TOTAL                          1724.763 s
  parse                          13.398 s
  convert to CIL                 18.656 s
  analysis                       1692.710 s
    global_inits                   16.474 s
    solving                        1674.621 s
      S.Dom.equal                     0.535 s
      postsolver                      7.987 s
    warn_global                     0.016 s
Timing used
Memory statistics: total=3913961.25MB, max=3778.66MB, minor=3913810.16MB, major=95671.32MB, promoted=95520.23MB
    minor collections=1866348  major collections=115 compactions=0
sim642 commented 2 years ago

Thanks for looking into the parsing errors. I'm guessing with all the patches that's with the commands I used in my first attempt, without any cppflags hackery?

Also, what about the __float128 problem from my third attempt? Is it somehow not a problem anymore? Was it just a problem because of my cppflags hacks?

michael-schwarz commented 2 years ago

I'm guessing with all the patches that's with the commands I used in my first attempt, without any cppflags hackery?

Yup, just with the first command.

Also, what about the __float128 problem from my third attempt? Is it somehow not a problem anymore?

Seems like it, I get no error there.

michael-schwarz commented 2 years ago
Found dead code on 21040 lines (including 21039 in uncalled functions)!
Total lines (logical LoC): 22475

Still somewhat on the small side :/

sim642 commented 2 years ago

Something seems off there. Why would it take 30s to parse just 22000 lines? And why would it be almost all dead?

michael-schwarz commented 2 years ago

The generated CIL file is 262000 lines. I guess a blowup of 10 because of CIL and not counting prototypes etc is realistic, right?

michael-schwarz commented 2 years ago

The reason so many things are dead is because it is combining several definitions of main, keeping only one. So things that are not called in that specific main we are keeping are dead.

michael-schwarz commented 2 years ago

Here there 6 different definitions of main():

Warning: def'n of func main at test/shlibloadtest.c:254 (sum 221463748258) conflicts with the one at test/ssl_old_test.c:868 (sum -1877117245207174749)
Warning: def'n of func main at test/memleaktest.c:35 (sum 2272283) conflicts with the one at test/ssl_old_test.c:868 ...
Warning: def'n of func main at test/versions.c:15 (sum 2555) conflicts with the one at test/ssl_old_test.c:868 ...
Warning: def'n of func main at test/rsa_complex.c:23 (sum 198) conflicts with the one at test/ssl_old_test.c:868 ...
Warning: def'n of func main at test/moduleloadtest.c:40 (sum 1613636) conflicts with the one at test/ssl_old_test.c:868 ...
sim642 commented 2 years ago

Maybe the default OpenSSL configuration doesn't make all the command line utilities? Because those are just tests, but I thought there should be the openssl binary as well.

Also, does the line count roughly match up if checked with cloc or something?

michael-schwarz commented 2 years ago

cloc says 418430 lines of C code

michael-schwarz commented 2 years ago

Might be an issue with bear 2.3.11 from Ubuntu 18.04

With https://github.com/nickdiego/compiledb I get a compilation db that is a lot bigger, but also new failures /home/michael/Documents/goblint-cil/analyzer/.goblint/preprocessed/crypto/aes/aes-x86_64.i

michael-schwarz commented 2 years ago

After https://github.com/goblint/analyzer/pull/530, it chokes

crypto/lhash/lhash.c[171:20-20] : syntax error

on this wonderful snippet:

# 171 "crypto/lhash/lhash.c" 3 4
   __extension__ ({ __auto_type __atomic_store_ptr = (
# 171 "crypto/lhash/lhash.c"
   ((_Atomic int *)&lh->error)
# 171 "crypto/lhash/lhash.c" 3 4
   ); __typeof__ (*__atomic_store_ptr) __atomic_store_tmp = (
# 171 "crypto/lhash/lhash.c"
   (0)
# 171 "crypto/lhash/lhash.c" 3 4
   ); __atomic_store (__atomic_store_ptr, &__atomic_store_tmp, (
# 171 "crypto/lhash/lhash.c"
   memory_order_relaxed
# 171 "crypto/lhash/lhash.c" 3 4
   )); })
# 171 "crypto/lhash/lhash.c"
                                                  ;

which is what GCC turns calls to atomic_store_explicit into. :disappointed:

https://en.cppreference.com/w/c/atomic/atomic_store

The issue seems to be with the __auto_type feature https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Typeof.html ...

michael-schwarz commented 2 years ago

With https://github.com/goblint/cil/pull/67 and the support for __auto_type that that adds, CIL now runs out of memory (at 12GB) because there is so many things to be parsed apparently.

It in this case does not get to the Merging phase even but fails in Frontc when it is out of memory. Does not seem to be a specific file taht is causing this, memory usage just increases linearly for a long time.

Also, it takes several minutes to get to that state.

michael-schwarz commented 2 years ago

It is 1300 files it wants to parse out of which all are unique.

sim642 commented 2 years ago

Might be an issue with bear 2.3.11 from Ubuntu 18.04

With https://github.com/nickdiego/compiledb I get a compilation db that is a lot bigger, but also new failures /home/michael/Documents/goblint-cil/analyzer/.goblint/preprocessed/crypto/aes/aes-x86_64.i

That's interesting. I might have to try with a newer bear version on Ubuntu 21.04.

It in this case does not get to the Merging phase even but fails in Frontc when it is out of memory. Does not seem to be a specific file taht is causing this, memory usage just increases linearly for a long time.

Does Goblint's verbose output show it continuously parsing all the files as the memory usage is growing or maybe it gets stuck on one, where it does something weird that causes a memory leak?

michael-schwarz commented 2 years ago

Does Goblint's verbose output show it continuously parsing all the files as the memory usage is growing

Yes, each individual file does not seem to take too long, and the speed at which it parses each one seems constant too. Memory usage just keeps increasing until it is then killed at some point.

I'll try on the server that has more RAM than my machine. If it also runs out there, there probably is some bug (as opposed to just inefficient memory usage).

michael-schwarz commented 2 years ago

On the server it succeeds with peak memory usage at around 20GB and the CIL phase taking a few minutes.

sim642 commented 2 years ago

Here's some speculation: I guess if most of the files have some of the same includes copied in, they will be parsed repeatedly and there will be 1300 copies of them in memory by the end, each in its own Cil.file. Only then we merge the files, leaving single copies of everything.

What if for every file immediately after parsing also merge it? Essentially combining the currently separate parsing (List.map) and merging into a single fold. If the files are mergeable, they should be either way, but maybe this would avoid having to keep all 1300 Cil.files around at the same time, especially if they contain large amounts of common declarations.

I'm not sure how efficient CIL's merging is though, maybe it's too slow to be done after each parsed file. But we could then group the merging somehow, e.g. 10 files at a time, to find a middle road, if my speculation holds.

sim642 commented 2 years ago

I tried it again with the compilation database from bear 3.0.8. I'm confused about how that compares to the compilation databases you got:

I guess compiledb outputs all non-C stuff as well. That aes-x86_64.pl is originally a perl script.

Another thing I noticed is that there are two copies of every file: one for lib and one for shlib. So we definitely have to limit the configuration and make to certain parts of OpenSSL.

Using bear's compilation database, parsing also ran out of memory, so I suppose the database from the newer version seems more complete and contains sufficiently many things to reach that point.

I also tried my parse-merge-parse-...-merge strategy. The memory usage grew a lot slower, but all the merging after every step was also really slowing it down, so I'm not sure if it actually got any further before also running out of memory.

michael-schwarz commented 2 years ago

There is no aes-x86_64.i.

There is not on my system either. That was the problem where Goblint did not ignore the crypto/aes/aes-x86_64.s file and therefore looked for the .i file supposedly created from this one.

I also tried my parse-merge-parse-...-merge strategy. The memory usage grew a lot slower, but all the merging after every step was also really slowing it down, so I'm not sure if it actually got any further before also running out of memory.

If you want, you can push this to some branch and I can run it on the server with my compilation database. We would then see if it makes a difference w.r.t. to memory / time?

sim642 commented 2 years ago

If you want, you can push this to some branch and I can run it on the server with my compilation database. We would then see if it makes a difference w.r.t. to memory / time?

I opened a draft PR for it: https://github.com/goblint/analyzer/pull/532.

One thing I saw with those intermediate merges were a bunch of merging conflicts and whatnot from CIL.

sim642 commented 2 years ago

There is not on my system either. That was the problem where Goblint did not ignore the crypto/aes/aes-x86_64.s file and therefore looked for the .i file supposedly created from this one.

I mentioned above that it's actually a perl script, not an .s file. But looking at it, seems like the script is used to generate the .s file at some point in the compilation process and that's not reflected by the compilation database.

This makes me wonder how well compilation databases deal with generated sources. I guess as long as under bear/compiledb those generators actually run to create the file and the generated files stay around and eventually get passed to gcc, then hopefully it should be fine.

sim642 commented 2 years ago

Anyway, my fabulous merging idea didn't really help, but I discovered a possible parsing culprit: https://github.com/goblint/cil/blob/b77c663690519be8a672f871a036df3d89b677d5/src/frontc/cabs2cil.ml#L330-L332 That global hashtable in CIL (used for Gobview and witness generation) isn't intentionally cleared between parsing files, which means it will accumulate a massive amount of data.

When I remove the two H.add environment calls, it more than halves the memory usage during parsing. It actually manages to parse all 2165 entries from my bear compilation database in 13GB of RAM. At that point I didn't have enough free for the merging to work anymore.

sim642 commented 2 years ago

Here's something that might be manageable:

./Configure no-asm
make build_generated
bear -- make -j12 apps/openssl
goblint -v .

It only builds the openssl binary, not the entire library and tests and whatnot, so there will be 1000 compilation database entries. With the environment removal, it takes about 7GB to parse and merge. When it starts solving (after 88535 assigns to 2840 globals), the memory usage drops down to 2GB. I left it running for almost 30mins and it got to this point:

runtime: 00:29:09.304
vars: 17927, evals: 1335597

|rho|=17927
|called|=597
|stable|=11897
|infl|=17338
|wpoint|=5
Found 855 contexts for 115 functions. Top 5 functions:
83  contexts for entry state of CRYPTO_free on include/openssl/crypto.h:347:6-347:56
53  contexts for entry state of CRYPTO_malloc on include/openssl/crypto.h:342:6-342:60
46  contexts for entry state of memset on /usr/include/x86_64-linux-gnu/bits/string_fortified.h:56:1-61:1
46  contexts for entry state of CRYPTO_zalloc on include/openssl/crypto.h:343:6-343:60
28  contexts for entry state of CRYPTO_THREAD_run_once on include/openssl/crypto.h:533:5-533:66

Memory statistics: total=1910897.74MB, max=8740.27MB, minor=1910092.43MB, major=26372.86MB, promoted=25567.55MB
    minor collections=910882  major collections=53 compactions=1

The evals count kept going up, but none of the other numbers increased (called fluctuated up and down by a few). Not sure, maybe some termination problem?

michael-schwarz commented 2 years ago

Is it normal that the build fails for this configuration? If yes, how do we know that all meaningful calls to the compiler happened before the build was terminated (and are hence in the database)?

sim642 commented 2 years ago

I don't think it failed for me. Maybe you need to make clean before.

michael-schwarz commented 2 years ago

Ok, I needed to reclone for some reason.

With the smaller compilation database but without removing environment, parsing and merging takes about 4 minutes (the times report by CIL are somehow wrong, it reports a time of 109.383s).

sim642 commented 2 years ago

I might have to try with compiledb as well, just to be sure. Because it definitely took much longer than that for me to finish merging.

On another note, flipping the def_exc widen to join fixes that termination problem it seems:

runtime: 00:57:41.991
vars: 64354, evals: 95118

|rho|=64354
|called|=397
|stable|=56410
|infl|=63978
|wpoint|=8
Found 4564 contexts for 219 functions. Top 5 functions:
518 contexts for entry state of getrn on crypto/lhash/lhash.c:304:1-333:1
506 contexts for entry state of OPENSSL_LH_insert on include/openssl/lhash.h:84:6-84:55
505 contexts for entry state of ossl_check_ERR_STRING_DATA_lh_plain_type on include/openssl/err.h:362:358-362:490
304 contexts for entry state of CRYPTO_free on include/openssl/crypto.h:347:6-347:56
147 contexts for entry state of CRYPTO_malloc on include/openssl/crypto.h:342:6-342:60

Memory statistics: total=4130646.10MB, max=8740.27MB, minor=4128898.22MB, major=77863.06MB, promoted=76115.18MB
    minor collections=1968945  major collections=116 compactions=1

It has definitely reached a lot more variables now.

So we seem to have a much bigger benchmark now, but it's way too big for anything interactive if we're so hopelessly bottlenecked by parsing it.

michael-schwarz commented 2 years ago

Because it definitely took much longer than that for me to finish merging.

Was that with https://github.com/goblint/analyzer/pull/532? For me that leads to merging being horribly slow.

sim642 commented 2 years ago

No, it was without as well. I might try a grouping version of it at some point to see if there's any hope to keep the maximum memory usage lower without impacting performance notably.

sim642 commented 2 years ago

By the way, I let it running for the whole day on my laptop today and it got this far before I finally killed it:

runtime: 08:29:16.760
vars: 445228, evals: 727221

|rho|=445228
|called|=926
|stable|=373796
|infl|=444313
|wpoint|=6
Found 33743 contexts for 271 functions. Top 5 functions:
4839    contexts for entry state of getrn on crypto/lhash/lhash.c:304:1-333:1
4721    contexts for entry state of OPENSSL_LH_insert on include/openssl/lhash.h:84:6-84:55
4709    contexts for entry state of ossl_check_ERR_STRING_DATA_lh_plain_type on include/openssl/err.h:362:358-362:490
1883    contexts for entry state of CRYPTO_free on include/openssl/crypto.h:347:6-347:56
1059    contexts for entry state of CRYPTO_malloc on include/openssl/crypto.h:342:6-342:60

Memory statistics: total=30832386.18MB, max=15081.05MB, minor=30821416.24MB, major=608177.76MB, promoted=597207.82MB
    minor collections=14697314  major collections=340 compactions=1

And it didn't seem completely stuck because vars kept increasing too.

It might be smarter to try analyzing single test suites instead, because the openssl binary probably includes most of the library features as well.

michael-schwarz commented 2 years ago

Not sure if this is an issue with the database we obtain from compiledb or some other unsoundness, but I got the analysis to terminate quite quickly (apart from minutes of parsing and merging):

./Configure no-asm
make build_generated
compiledb make apps/openssl
./goblint-2 ../openssl -v --disable ana.base.context.non-ptr --disable sem.unknown_function.spawn --set ana.thread.domain plain --enable exp.earlyglobs --set ana.base.privatization none --enable dbg.print_dead_code &> openssl.txt

It finds a total of 67 live lines starting from main in apps/openssl.c.

vars = 3199    evals = 250  

Timings:
TOTAL                          262.916 s
  parse                          46.213 s
  convert to CIL                 62.895 s
  analysis                       153.809 s
    global_inits                   149.484 s
    solving                         0.951 s
      S.Dom.equal                     0.001 s
      postsolver                      0.305 s
    warn_global                     0.002 s
Timing used
Memory statistics: total=389135.44MB, max=15286.80MB, minor=387937.31MB, major=20213.36MB, promoted=19015.23MB
    minor collections=185046  major collections=29 compactions=0
Found dead code on 156129 lines (including 156049 in uncalled functions)!
Total lines (logical LoC): 156196

@sim642 did you run your previous experiments with --disable sem.unknown_function.spawn or without? Maybe all that was happening in your runs was that the unknown functions spawned everything that appeared somewhere?

Goblint was a72afa46a5cb826f7a936420428fef844dcf971c (with goblint-cil at https://github.com/goblint/cil/pull/67). Compile Database

Log

michael-schwarz commented 2 years ago
int main(int argc, char *argv[])
{
    FUNCTION f, *fp;
    LHASH_OF(FUNCTION) *prog = NULL;
    char *pname;
    const char *fname;
    ARGS arg;
    int global_help = 0;
    int ret = 0;

    arg.argv = NULL;
    arg.size = 0;

    /* Set up some of the environment. */
    bio_in = dup_bio_in(FORMAT_TEXT);
///!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    // Everything after this is dead(!)
///!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    bio_out = dup_bio_out(FORMAT_TEXT); // never called
    bio_err = dup_bio_err(FORMAT_TEXT); // never called

#if defined(OPENSSL_SYS_VMS) && defined(__DECC)
    argv = copy_argv(&argc, argv);
#elif defined(_WIN32)
    /* Replace argv[] with UTF-8 encoded strings. */
    win32_utf8argv(&argc, &argv);
#endif

#ifndef OPENSSL_NO_TRACE
    setup_trace(getenv("OPENSSL_TRACE"));
#endif

    if ((fname = "apps_startup", !apps_startup())
            || (fname = "prog_init", (prog = prog_init()) == NULL)) {
        BIO_printf(bio_err,
                   "FATAL: Startup failure (dev note: %s()) for %s\n",
                   fname, argv[0]);
        ERR_print_errors(bio_err); //dead
        ret = 1; //dead
        goto end;
    }
    pname = opt_progname(argv[0]); // dead

    default_config_file = CONF_get1_default_config_file(); //dead
    if (default_config_file == NULL) // dead
        app_bail_out("%s: could not get default config file\n", pname); //dead

    /* first check the program name */
    f.name = pname; //dead
    fp = lh_FUNCTION_retrieve(prog, &f); //dead
    if (fp == NULL) { //dead
        /* We assume we've been called as 'openssl ...' */
        global_help = argc > 1
            && (strcmp(argv[1], "-help") == 0 || strcmp(argv[1], "--help") == 0
                || strcmp(argv[1], "-h") == 0 || strcmp(argv[1], "--h") == 0);
        argc--; //dead
        argv++; //dead
        opt_appname(argc == 1 || global_help ? "help" : argv[0]); //dead
    } else {
        argv[0] = pname; // dead
    }

    /* If there's a command, run with that, otherwise "help". */
    ret = argc == 0 || global_help
        ? do_cmd(prog, 1, help_argv)
        : do_cmd(prog, argc, argv);

 end:
    OPENSSL_free(default_config_file);  // dead
    lh_FUNCTION_free(prog);  // dead
    OPENSSL_free(arg.argv);  // dead
    if (!app_RAND_write())  // dead
        ret = EXIT_FAILURE;  // dead

    BIO_free(bio_in); // dead
    BIO_free_all(bio_out); // dead
    apps_shutdown(); // dead
    BIO_free(bio_err); // dead
    EXIT(ret); // dead
}
sim642 commented 2 years ago

Your compilation database has 1005 entries, mine had 1007 I think, so that's probably correct. I didn't do any option tuning (other than the def_exc_widen_by_join thing before it was fixed).

67 lines out of 156196 being live is highly suspicious. I guess there's an unknown function through which most of it becomes live.

michael-schwarz commented 2 years ago

67 lines out of 156196 being live is highly suspicious.

Absolutely, I am digging through it to find the issue.

michael-schwarz commented 2 years ago

[Error][Imprecise][Unsound] Function definition missing for pthread_once (crypto/threads_pthread.c:144:9-144:38)

This seems to be the culprit. It should call its argument.

michael-schwarz commented 2 years ago

With https://github.com/goblint/analyzer/pull/547, more code is live

Live lines: 5162
Found dead code on 151647 lines (including 151422 in uncalled functions)!
Total lines (logical LoC): 156809
vars = 39042    evals = 183199  

Timings:
TOTAL                          442.992 s
  parse                          46.057 s
  convert to CIL                 62.759 s
  analysis                       334.177 s
    global_inits                   147.109 s
    solving                        183.133 s
      S.Dom.equal                     0.151 s
      postsolver                     28.937 s
    warn_global                     0.002 s
Timing used
Memory statistics: total=748382.60MB, max=20216.80MB, minor=746665.30MB, major=27637.81MB, promoted=25920.51MB
    minor collections=356106  major collections=29 compactions=0

However, this seems quite low, so I guess there is more issues.

michael-schwarz commented 2 years ago

Actually, we are not reaching a fixpoint in this example:

Fixpoint not reached at node 243216 "*src" on crypto/o_str.c:75:5-78:5
 Solver computed:
 {([Reversed (top or Set (Normal Lvals * booleans)):{}, Unit:(),
    top or Set (variables):{}, booleans:False, MT mode:Singlethreaded,
    Thread * lifted created and Unit:(main, bot),
    value domain * array partitioning deps * Vars with Weak Update * Unit:(mapping {
                                                                             Local {
                                                                               txtoid ->
                                                                                 (Array: (Unknown int([-7,7])), (50))
                                                                               l ->
                                                                                 (Unknown int([0,64]))
                                                                             }
                                                                             Parameter {
                                                                               dst ->
                                                                                 {&txtoid[def_exc:Unknown int([-63,63])]}
                                                                               src ->
                                                                                 {&((alloc@sid:242444), crypto/mem.c:184:5-184:23), &((alloc@sid:242444), crypto/mem.c:184:5-184:23)}
                                                                               size ->
                                                                                 (Not {0}([0,64]))
                                                                             }
                                                                             Temp {
                                                                               tmp ->
                                                                                 {&txtoid[def_exc:Unknown int([-63,63])], NULL, ?}
                                                                               tmp___0 ->
                                                                                 {NULL, ?, &((alloc@sid:242444), crypto/mem.c:184:5-184:23), &((alloc@sid:242444), crypto/mem.c:184:5-184:23)}
                                                                               tmp___1 ->
                                                                                 (Unknown int([0,64]))
                                                                             }
                                                                           }, mapping {
                                                                                  }, {}, ()),
    lifted node:Unknown node, Unit:()], mapping {
                                          })}
 Right-Hand-Side:
 {([Reversed (top or Set (Normal Lvals * booleans)):{}, Unit:(),
    top or Set (variables):{}, booleans:False, MT mode:Singlethreaded,
    Thread * lifted created and Unit:(main, bot),
    value domain * array partitioning deps * Vars with Weak Update * Unit:(mapping {
                                                                             Local {
                                                                               txtoid ->
                                                                                 (Array: (Unknown int([-7,7])), (50))
                                                                               l ->
                                                                                 (Unknown int([0,8]))
                                                                             }
                                                                             Parameter {
                                                                               dst ->
                                                                                 {&txtoid[def_exc:Unknown int([-63,63])]}
                                                                               src ->
                                                                                 {&((alloc@sid:242444), crypto/mem.c:184:5-184:23), &((alloc@sid:242444), crypto/mem.c:184:5-184:23)}
                                                                               size ->
                                                                                 (Not {0}([0,64]))
                                                                             }
                                                                             Temp {
                                                                               tmp ->
                                                                                 {&txtoid[def_exc:Unknown int([-63,63])], NULL, ?}
                                                                               tmp___0 ->
                                                                                 {NULL, ?, &((alloc@sid:242444), crypto/mem.c:184:5-184:23), &((alloc@sid:242444), crypto/mem.c:184:5-184:23)}
                                                                               tmp___1 ->
                                                                                 (Unknown int([0,64]))
                                                                             }
                                                                           }, mapping {
                                                                                  }, {}, ()),
    lifted node:Unknown node, Unit:()], mapping {
                                          })}
 Difference: ([Reversed (top or Set (Normal Lvals * booleans)):{}, Unit:(),
               top or Set (variables):{}, booleans:False,
               MT mode:Singlethreaded,
               Thread * lifted created and Unit:(main, bot),
               value domain * array partitioning deps * Vars with Weak Update * Unit:(mapping {
                                                                                        Local {
                                                                                          txtoid ->
                                                                                            (Array: (Unknown int([-7,7])), (50))
                                                                                          l ->
                                                                                            (Unknown int([0,8]))
                                                                                        }
                                                                                        Parameter {
                                                                                          dst ->
                                                                                            {&txtoid[def_exc:Unknown int([-63,63])]}
                                                                                          src ->
                                                                                            {&((alloc@sid:242444), crypto/mem.c:184:5-184:23), &((alloc@sid:242444), crypto/mem.c:184:5-184:23)}
                                                                                          size ->
                                                                                            (Not {0}([0,64]))
                                                                                        }
                                                                                        Temp {
                                                                                          tmp ->
                                                                                            {&txtoid[def_exc:Unknown int([-63,63])], NULL, ?}
                                                                                          tmp___0 ->
                                                                                            {NULL, ?, &((alloc@sid:242444), crypto/mem.c:184:5-184:23), &((alloc@sid:242444), crypto/mem.c:184:5-184:23)}
                                                                                          tmp___1 ->
                                                                                            (Unknown int([0,64]))
                                                                                        }
                                                                                      }, mapping {
                                                                                             }, {}, ()),
               lifted node:Unknown node, Unit:()], mapping {
                                                     }):
 not leq ([Reversed (top or Set (Normal Lvals * booleans)):{}, Unit:(),
           top or Set (variables):{}, booleans:False, MT mode:Singlethreaded,
           Thread * lifted created and Unit:(main, bot),
           value domain * array partitioning deps * Vars with Weak Update * Unit:(mapping {
                                                                                    Local {
                                                                                      txtoid ->
                                                                                        (Array: (Unknown int([-7,7])), (50))
                                                                                      l ->
                                                                                        (Unknown int([0,64]))
                                                                                    }
                                                                                    Parameter {
                                                                                      dst ->
                                                                                        {&txtoid[def_exc:Unknown int([-63,63])]}
                                                                                      src ->
                                                                                        {&((alloc@sid:242444), crypto/mem.c:184:5-184:23), &((alloc@sid:242444), crypto/mem.c:184:5-184:23)}
                                                                                      size ->
                                                                                        (Not {0}([0,64]))
                                                                                    }
                                                                                    Temp {
                                                                                      tmp ->
                                                                                        {&txtoid[def_exc:Unknown int([-63,63])], NULL, ?}
                                                                                      tmp___0 ->
                                                                                        {NULL, ?, &((alloc@sid:242444), crypto/mem.c:184:5-184:23), &((alloc@sid:242444), crypto/mem.c:184:5-184:23)}
                                                                                      tmp___1 ->
                                                                                        (Unknown int([0,64]))
                                                                                    }
                                                                                  }, mapping {
                                                                                         }, {}, ()),
           lifted node:Unknown node, Unit:()], mapping {
                                                 }) because Map: tmp___0 =
 HoarePO: {NULL, ?, ((alloc@sid:242444), crypto/mem.c:184:5-184:23), ((alloc@sid:242444), crypto/mem.c:184:5-184:23)} not leq {NULL, ?, ((alloc@sid:242444), crypto/mem.c:184:5-184:23), ((alloc@sid:242444), crypto/mem.c:184:5-184:23)}.
michael-schwarz commented 2 years ago

It is also very strange that ((alloc@sid:242444), crypto/mem.c:184:5-184:23) is in each set twice....

michael-schwarz commented 2 years ago

Besides that there are ~50 instances of Unknown function ptr called.

michael-schwarz commented 2 years ago

With the output of the diff after https://github.com/goblint/analyzer/pull/548, it gets even more interesting:

HoarePO:
{NULL, ?, ((alloc@sid:242444), crypto/mem.c:184:5-184:23)[def_exc:Unknown int([-63,63])], ((alloc@sid:242444), crypto/mem.c:184:5-184:23)} 
  not leq
{NULL, ?, ((alloc@sid:242444), crypto/mem.c:184:5-184:23)[def_exc:Unknown int([-63,63])], ((alloc@sid:242444), crypto/mem.c:184:5-184:23)[def_exc:Unknown int([-63,63])]}. 

and what the vids also printed:

  HoarePO: {NULL, ?, ((alloc@sid:242444)10406316353, crypto/mem.c:184:5-184:23)[def_exc:Unknown int([-63,63])], ((alloc@sid:242444)10406316353, crypto/mem.c:184:5-184:23)} not leq {NULL, ?, ((alloc@sid:242444)10406316353, crypto/mem.c:184:5-184:23)[def_exc:Unknown int([-63,63])], ((alloc@sid:242444)10406316353, crypto/mem.c:184:5-184:23)[def_exc:Unknown int([-63,63])]}. 
michael-schwarz commented 2 years ago

For further experiments. Contains manual fix for a case where CIL's alpha renaming gets confused. openssl.cil-fixed.zip

michael-schwarz commented 2 years ago

All allocations happen via wrappers, leading to only one heap object with content supertop. Most function calls happen via a hashtable of function pointers, and that one is also represented by supertop.

I'll try if adding some custom wrappers helps: --set ana.malloc.wrappers[+] CRYPTO_malloc --set ana.malloc.wrappers[+] CRYPTO_zalloc

michael-schwarz commented 2 years ago

Did not really have any effect.

I think we can probably give up on getting meaningful results here in near future, barring major improvements to several parts.

sim642 commented 2 years ago

What's with the fixpoint error and CIL's alpha renaming though? Should probably have issues/PRs for those even if we cannot get anything useful on OpenSSL.

michael-schwarz commented 2 years ago

For the fixpoint issue I'm running creduce since this morning (down to 9MB from 36MB, hoping it will get even smaller) and will open an issue with that example program.

The alpha renaming thing is probably not so critical, but I guess I can also open an issue for that. It is just non-trivial to reproduce.