Open mattmccutchen-cci opened 3 years ago
Since many of the tests depend on the convert-project.py
, we'll probably want to handle at least part of #329 at the same time.
Update:
-alltypes
output compiles.-alltypes
output compiles except for some error message templates that we've designated as "known bounds inference limitations". However, we should still do more to assess the quality of the conversion.Unfortunately, both of these tests are currently far from passing. There is a long tail of 3C issues affecting the benchmarks, and fixing them hasn't been our highest priority.
Currently, the benchmark tests only check that 3C completes successfully (exit code 0). We should add more verification of the output. Ideas:
-alltypes
, the output should compile.With
-alltypes
, we could compare the output to a set of "reference" output files we maintain somewhere (similar to the manual evaluation procedure in the 3C document that we haven't used in a while). We have to think about how to make this useful: just having the tests fail every time the output changes may not be useful enough to justify the hassle.Another idea from Mike: Compare the count of each kind of checked pointer (similar to Tables 2-4 of the 3C paper submission (I won't paste the capability-granting link here)) and the number of Checked C type checking errors between the actual output and the reference output. If the count of checked pointers increases and the number of compile errors doesn't, that's probably good. We'd still have to decide how to translate that to a pass/fail status or how else to ensure we remember to monitor the quality of conversion.