OCamlPro / owi

WebAssembly Swissknife & cross-language bugfinder
https://ocamlpro.github.io/owi/
GNU Affero General Public License v3.0
137 stars 18 forks source link

Simplify out_of_bounds checks #393

Closed filipeom closed 3 months ago

filipeom commented 4 months ago

This also fixed some benchmarks of #273

zapashcanon commented 4 months ago

I'm not sure I get the link with the linked issue, could you explain?

filipeom commented 4 months ago

I'm not sure I get the link with the linked issue, could you explain?

Because this (I32.add base size) would overflow depending on the base pointer (i.e., if I decreased stack size during compilation it no longer overflowed) causing those spurious heap overflow traps.

zapashcanon commented 4 months ago

Oh OK (I guess you mean #371 then ?)

zapashcanon commented 4 months ago

Aren't we going to miss some heap overflow with this simplification ?

filipeom commented 4 months ago

Aren't we going to miss some heap overflow with this simplification ?

I'm not quite sure we would, can you give an example? Either way, I can close this as I would rather not introduce bugs by mistake :sweat_smile:

zapashcanon commented 4 months ago

For instance, shouldn't we check that base + offset < size ?

Anyway, the results are good, tool1 is the previous main after adding concretisation of symbolic addresses, and tool2 is this PR+--fail-on-assertion-only:

tool1 had 689 reached and tool2 had 675 reached
tool1 had 526 timeout and tool2 had 540 timeout
tool1 had 000 nothing and tool2 had 000 nothing
tool1 had 001 other   and tool2 had 001 other
tool1 had 000 killed  and tool2 had 000 killed

tools have 675 reached tasks in common

tool1 had 014 tasks tool2 did not found
tool2 had 000 tasks tool1 did not found

on      commonly  reached tasks, tool 1 took 1506.669011 sec. (mean 2.232102) and tool 2 took 1637.898090 sec. (mean 2.426516)
on *not commonly* reached tasks, tool 1 took 63.164101 sec. (mean 4.511721) and tool 2 took 0.000000 sec. (mean -nan)

among tasks reached only by tool 1, tool 2 replied 000 nothing, 014 timeout, 00 other and 00 killed
among tasks reached only by tool 2, tool 1 replied 000 nothing, 000 timeout, 00 other and 00 killed

So we are much better than the old version with nothing where the score was around 608 and we did not loose that much compared to the version where we reported wrong errors, only a difference of 14 cases. :)

I also checked, and there's no more failures due to a trap. I'll start a benchmark with --fail-on-assertion-only and not this PR to check what is going on. :)

filipeom commented 4 months ago

For instance, shouldn't we check that base + offset < size ?

size is just the size of the heap chunk starting in base and that is what we were previously checking. I.e., valid?(base + offset < base + size) which is equivalent to valid?(offset < size)

zapashcanon commented 4 months ago

OK then this looks good to me. :)

zapashcanon commented 4 months ago

Difference with this PR and fail-on-assertion-only:

tool1 had 676 reached and tool2 had 675 reached
tool1 had 539 timeout and tool2 had 540 timeout
tool1 had 000 nothing and tool2 had 000 nothing
tool1 had 001 other   and tool2 had 001 other
tool1 had 000 killed  and tool2 had 000 killed

tools have 672 reached tasks in common

tool1 had 004 tasks tool2 did not found
tool2 had 003 tasks tool1 did not found

on      commonly  reached tasks, tool 1 took 1491.607833 sec. (mean 2.219655) and tool 2 took 1578.647890 sec. (mean 2.349178)
on *not commonly* reached tasks, tool 1 took 76.909100 sec. (mean 19.227275) and tool 2 took 59.250200 sec. (mean 19.750067)

among tasks reached only by tool 1, tool 2 replied 000 nothing, 004 timeout, 00 other and 00 killed
among tasks reached only by tool 2, tool 1 replied 000 nothing, 003 timeout, 00 other and 00 killed
zapashcanon commented 4 months ago

Better diff:

tool1 had 676 reached and tool2 had 675 reached
tool1 had 539 timeout and tool2 had 540 timeout
tool1 had 000 nothing and tool2 had 000 nothing
tool1 had 001 other   and tool2 had 001 other
tool1 had 000 killed  and tool2 had 000 killed

tools have 672 reached tasks in common

tool1 had 004 tasks tool2 did not found
tool2 had 003 tasks tool1 did not found

on      commonly  reached tasks, tool 1 took 1491.607833 sec. (mean 2.219655, median 0.512458, min 0.229161, max 29.729300) and tool 2 took 1578.647890 sec. (mean 2.349178, median 0.524138, min 0.267979, max 29.508700)
on *not commonly* reached tasks, tool 1 took 76.909100 sec. (mean 19.227275, median 24.996700, min 12.968900, max 26.641300) and tool 2 took 59.250200 sec. (mean 19.750067, median 21.770300, min 13.532400, max 23.947500)

among tasks reached only by tool 1, tool 2 replied 000 nothing, 004 timeout, 00 other and 00 killed
among tasks reached only by tool 2, tool 1 replied 000 nothing, 003 timeout, 00 other and 00 killed
tasks solved only by tool 1:
  testcomp/sv-benchmarks/c/xcsp/AllInterval-014.c
  testcomp/sv-benchmarks/c/nla-digbench-scaling/freire2_valuebound20.c
  testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.peg_solitaire.6.prop1-back-serstep.c
  testcomp/sv-benchmarks/c/array-fpi/ifcompf.c
tasks solved only by tool 2:
  testcomp/sv-benchmarks/c/xcsp/CostasArray-12.c
  testcomp/sv-benchmarks/c/reducercommutativity/rangesum.i
  testcomp/sv-benchmarks/c/hardware-verification-bv/btor2c-lazyMod.adding.4.prop1-back-serstep.c