Open rod-chapman opened 7 months ago
Changed logic from "ALL" to "AUFBVDTNIA" for all files, and updated name of enclosing directory accordingly.
The CI complains because some benchmarks are duplicates:
$ find . ! -empty -type f -name '*smt2' -exec md5sum {} + | sort | uniq -w32 -d | cut -d ' ' -f 3
./mlkem.adb_1802_33_loop_invariant_preserv_1.smt2
./mlkem.adb_939_22_assert_1.smt2
./mlkem.adb_1430_47_range_check_1.smt2
./mlkem.adb_187_25_assert_1.smt2
./mlkem.adb_218_25_assert_1.smt2
./mlkem.adb_1802_33_loop_invariant_init_1.smt2
./mlkem.adb_1810_22_assert_1.smt2
Got it... rather surprisingly, Why3 is capable of generating duplicate SMT files in some cases. Removed now.
Ah... I see that "dolmen" rejects the use of "bv2nat" - is that show-stopping?
In my understanding the SMT-LIB standard will soon include bv2nat
.
I think the best way forward is that we check the benchmarks as far as we can get, but only include them in a release once bv2nat
is supported. Should the standard function have a different name (like bv.to-int
) we can fix that on our side.
The metadata-header looks good, so once Dolmen has a bv2nat
mode ( https://github.com/Gbury/dolmen/issues/205 ) we know if there are any other issues.
I was able to check the benchmarks locally with Dolmen 0.10 and with the --ext=bvconv
extension. They all passed.
Hence, the benchmarks are as good as the can be right now.
Since this extension is unfortunately not in the standard yet, we can't include them in a release, but I added a tag to mark them as checked.
This PR adds SMTLib benchmarks that arise from the verification of the LibMLKEM library.
LibMLKEM is a formal reference implementation of the FIPS-203 MLKEM cryptographic algorithm (formally known as "Kyber")
The SMT files have been generated by the SPARK toolset (version 13.2.1), supported by Why3 (version 1.5), from the LibMLKEM sources at https://github.com/awslabs/LibMLKEM at commit 0bd402c with the following modifications:
A portfolio of CVC4, CVC5, Z3 and Alt-Ergo can prove all of these, but no one prover is good enough to prove them all.