Closed jerhard closed 2 years ago
I had a look at all the tests where the mutex-meet-tid
configuration produces one or more Unknown assertions, or where we consider an assert(0)
as potentially reachable. Below is a short description of my investigation. In some cases, the asserts seem to be unknown in the concrete, or reachable locations are marked with assert(0)
. Still, there are some interesting cases which we cannot do, at least not with mutex-meet-tid
. For example intra01/main.c
, fk-cmp01/main.c
, and flagLock01/main.c
.
intra01/main.c: Program flow can be excluded by pre-condition that write from other thread already happened and was overwritten
intra01/main2.c: same as intra01/main.c
rev01/main.c Reachable(!) location is annotated with assert(false);
intra01_join/main.c: same as intra01/main.c
thread01/main.c same as intra01/main.c
thread01/main2.c same as intra01/main.c
sync_01_true/main.c write by other thread can be assumed as flag variable written after the write is checked
sync_01_true/main2.c same as sync_01_true/main.c
sync_02_true/main.c same as sync_01_true/main.c
syn_02_true/main2.c same as sync_01_true/main.c
keybISR/main.c similar to sync_01_true/main.c
keybISR/main2.c similar to sync_01_true/main.c
wdt977_02/main.c
The test is at fault: clearly the intention was to have an assert(0)
guarded by an if
statement in lines 302-304, but the if
is followed by a semicolon, making the assert unguarded. Thus, the assert(0)
is reachable in the concrete.
wdt977_02/main2.c Same as wdt977_02/main.c
fk-cmp01/main.c write by other thread can be assumed as flag variable written after the write is checked
fk2012/main.c The assert statement is wrong (not ensured in the concrete).
flagLock01/main.c Thread_joins on two threads ensure that both must have incremented the counter
i8xx_tco_01/main3.c _The assert in line 354 does not seem to be ensured in the concrete, as the thread main2 may write tco_expectclose directly before the read.
tso_na_01/main.c The assert seems not to be ensured in the concrete. The execution of y = 1; b = x; x = 1; a = y; should result in an assert failure.
Their smaller non-DDVerify benchmarks come from SV-COMP and I do recall seeing similar ones there, when searching for benchmarks. Since they're largely boolean flag based (and use these to model locks, etc), we obviously won't be able to solve them with just Apron. Looks like they used Z3, which can handle logical relations for them. But given that those are smaller compared to the DDVerify ones (where it sounds like we are more successful), the overall total we can prove might be quite good still, no?
The test case wdt977_02/main.c
were we have 30 failing asserts is actually broken.
The assert we report as failing is the following:
https://github.com/goblint/bench/blob/d80bb6c3d3f123b871cf7d7f5c22b86c41744afc/watts/wdt977_02/main.c#L302-L304
The intention most likely was to have a guarded assert(0)
. But the if
is followed by a semicolon, so the assert
is actually not guarded at all.
But given that those are smaller compared to the DDVerify ones (where it sounds like we are more successful), the overall total we can prove might be quite good still, no?
Yes, we have failing/unknown assert on 16 out of the 96 benchmarks and succeed on the other 80. And for 5 of those, it is not ensured in the concrete that the assert
will succeed.
Is there anything to still do here or can we merge this such that all the relational traces benchmarks are finally together in one place?
Yes, we can merge this.
This PR imports the tests from the Watts repository, which seem to be the ones used for this paper: Flow-sensitive composition of thread-modular abstract interpretation. Kusano and Wang.
The benchmark script is executed with
update_bench_traces_rel_watts.rb
.TODO:
assert(0)
s should actually be reachable.include <assert.h>
is commented out.