Closed filipeom closed 3 months ago
I'm not sure I get the link with the linked issue, could you explain?
I'm not sure I get the link with the linked issue, could you explain?
Because this (I32.add base size)
would overflow depending on the base
pointer (i.e., if I decreased stack size during compilation it no longer overflowed) causing those spurious heap overflow traps.
Oh OK (I guess you mean #371 then ?)
Aren't we going to miss some heap overflow with this simplification ?
Aren't we going to miss some heap overflow with this simplification ?
I'm not quite sure we would, can you give an example? Either way, I can close this as I would rather not introduce bugs by mistake :sweat_smile:
For instance, shouldn't we check that base + offset < size
?
Anyway, the results are good, tool1
is the previous main after adding concretisation of symbolic addresses, and tool2
is this PR+--fail-on-assertion-only
:
tool1 had 689 reached and tool2 had 675 reached
tool1 had 526 timeout and tool2 had 540 timeout
tool1 had 000 nothing and tool2 had 000 nothing
tool1 had 001 other and tool2 had 001 other
tool1 had 000 killed and tool2 had 000 killed
tools have 675 reached tasks in common
tool1 had 014 tasks tool2 did not found
tool2 had 000 tasks tool1 did not found
on commonly reached tasks, tool 1 took 1506.669011 sec. (mean 2.232102) and tool 2 took 1637.898090 sec. (mean 2.426516)
on *not commonly* reached tasks, tool 1 took 63.164101 sec. (mean 4.511721) and tool 2 took 0.000000 sec. (mean -nan)
among tasks reached only by tool 1, tool 2 replied 000 nothing, 014 timeout, 00 other and 00 killed
among tasks reached only by tool 2, tool 1 replied 000 nothing, 000 timeout, 00 other and 00 killed
So we are much better than the old version with nothing where the score was around 608 and we did not loose that much compared to the version where we reported wrong errors, only a difference of 14 cases. :)
I also checked, and there's no more failures due to a trap. I'll start a benchmark with --fail-on-assertion-only
and not this PR to check what is going on. :)
For instance, shouldn't we check that base + offset < size ?
size
is just the size of the heap chunk starting in base
and that is what we were previously checking. I.e., valid?(base + offset < base + size)
which is equivalent to valid?(offset < size)
OK then this looks good to me. :)
Difference with this PR and fail-on-assertion-only
:
tool1 had 676 reached and tool2 had 675 reached
tool1 had 539 timeout and tool2 had 540 timeout
tool1 had 000 nothing and tool2 had 000 nothing
tool1 had 001 other and tool2 had 001 other
tool1 had 000 killed and tool2 had 000 killed
tools have 672 reached tasks in common
tool1 had 004 tasks tool2 did not found
tool2 had 003 tasks tool1 did not found
on commonly reached tasks, tool 1 took 1491.607833 sec. (mean 2.219655) and tool 2 took 1578.647890 sec. (mean 2.349178)
on *not commonly* reached tasks, tool 1 took 76.909100 sec. (mean 19.227275) and tool 2 took 59.250200 sec. (mean 19.750067)
among tasks reached only by tool 1, tool 2 replied 000 nothing, 004 timeout, 00 other and 00 killed
among tasks reached only by tool 2, tool 1 replied 000 nothing, 003 timeout, 00 other and 00 killed
Better diff:
tool1 had 676 reached and tool2 had 675 reached
tool1 had 539 timeout and tool2 had 540 timeout
tool1 had 000 nothing and tool2 had 000 nothing
tool1 had 001 other and tool2 had 001 other
tool1 had 000 killed and tool2 had 000 killed
tools have 672 reached tasks in common
tool1 had 004 tasks tool2 did not found
tool2 had 003 tasks tool1 did not found
on commonly reached tasks, tool 1 took 1491.607833 sec. (mean 2.219655, median 0.512458, min 0.229161, max 29.729300) and tool 2 took 1578.647890 sec. (mean 2.349178, median 0.524138, min 0.267979, max 29.508700)
on *not commonly* reached tasks, tool 1 took 76.909100 sec. (mean 19.227275, median 24.996700, min 12.968900, max 26.641300) and tool 2 took 59.250200 sec. (mean 19.750067, median 21.770300, min 13.532400, max 23.947500)
among tasks reached only by tool 1, tool 2 replied 000 nothing, 004 timeout, 00 other and 00 killed
among tasks reached only by tool 2, tool 1 replied 000 nothing, 003 timeout, 00 other and 00 killed
tasks solved only by tool 1:
testcomp/sv-benchmarks/c/xcsp/AllInterval-014.c
testcomp/sv-benchmarks/c/nla-digbench-scaling/freire2_valuebound20.c
testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.peg_solitaire.6.prop1-back-serstep.c
testcomp/sv-benchmarks/c/array-fpi/ifcompf.c
tasks solved only by tool 2:
testcomp/sv-benchmarks/c/xcsp/CostasArray-12.c
testcomp/sv-benchmarks/c/reducercommutativity/rangesum.i
testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.adding.4.prop1-back-serstep.c
This also fixed some benchmarks of #273