goblint / analyzer

Static analysis framework for C
https://goblint.in.tum.de
MIT License
164 stars 72 forks source link

better regression testing #288

Open vogler opened 3 years ago

vogler commented 3 years ago

Problems with the current solution (update_suite.rb):

Original comment for this issue was regarding the last point.


@jerhard and I were tossing this back and forth over lunch and something we discussed at some point was using something along the lines of __goblint_check(a == 42, __GOBLINT_UNKNOWN) or __goblint_check(x ==17, __GOBLINT_SUCCESS) in the regression tests instead of having an assert(...) //UNKNOWN.

Then, the expected result is immediately obvious to Goblint itself, and Goblint can either output things for all asserts or only those where something failed without having to somehow do a regex on the comments.

Also, once could use the syntactic search to identify those places where __goblint_check(...) is called that are unreachable and then also warn there that the assert is unreachable. (These are the cases where warnings are missing in ./regtest.sh vs the ruby script).

The question that is still open is how to integrate warnings different from asserts into this setting.

Opinions?

Originally posted by @michael-schwarz in https://github.com/goblint/analyzer/issues/286#issuecomment-879796205

vogler commented 3 years ago

Context: https://github.com/goblint/analyzer/pull/286#issuecomment-879788192

I only realized this when I used regtest (which awfully does similar expected comparisons within Goblint...) and saw different output.

I agree with 'awful' concerning duplication, but IMO it'd be better for goblint to handle these things instead of some ruby script - ideally we'd get rid of it in favor of an OCaml executable or some minimal shell script (parallel) with goblint doing most of the work.

https://github.com/goblint/analyzer/pull/286#issuecomment-879848994

Also, once could use the syntactic search to identify those places where __goblint_check(...) is called that are unreachable and then also warn there that the assert is unreachable. (These are the cases where warnings are missing in ./regtest.sh vs the ruby script).

Good point to keep in mind. This (together with the rest?) could be done as post-processing in finalize of the assert-analysis.

using something along the lines of __goblint_check(a == 42, __GOBLINT_UNKNOWN) or __goblint_check(x ==17, __GOBLINT_SUCCESS) in ehe regression tests instead of having an assert(...) //UNKNOWN.

Pro:

Con:

https://github.com/goblint/analyzer/pull/286#issuecomment-879851445

Tbh, I don't really like the idea of this regression testing check (dbg.regression for regtest) being inside Goblint itself at all. It's ideologically wrong to have the testing logic right in the middle of Goblint (which would be both the tester and the testee simultaneously) instead of being independent. And having to write __goblint_check(x == 17, __GOBLINT_SUCCESS) is so much more verbose than a standard assert(x == 17).

But that's all beside the point for this issue of update_suite not reporting possible unsoundness cases. If we want to discuss completely redoing the regression testing architecture, it's best to have another issue for that. And that won't solve the fact that we still have ambiguous uses of UNKNOWN.

vogler commented 3 years ago

Tbh, I don't really like the idea of this regression testing check (dbg.regression for regtest) being inside Goblint itself at all. It's ideologically wrong to have the testing logic right in the middle of Goblint (which would be both the tester and the testee simultaneously) instead of being independent.

In principle yes, but the case here (at least at the moment) is just comparing strings, so it shouldn't be a problem. I think it's nice to be able to run goblint on a file and let it check the assertions. If you want that feature, the rest is not much more on top.

And having to write __goblint_check(x == 17, __GOBLINT_SUCCESS) is so much more verbose than a standard assert(x == 17).

Agree.

vogler commented 3 years ago

dune runtest/dune promote is nice, but it's also nice to have each test contain the expected result. https://dune.readthedocs.io/en/stable/tests.html#diffing-the-result

Maybe goblint (or some script) could annotate a file with comments for each assert/warning which could then be used for diff/promote.

sim642 commented 3 years ago

That's an interesting idea because it could be used to test for arbitrary warnings (or the lack thereof, which currently must be manually annotated by NOWARN). And it would be testing the actual Goblint output for assertions, just like an user would see on actual programs.

Although it might be too complicated to implement because we'd need to copy the input file and then modify it to add the comments purely based on CIL locations. And it takes away the possibility for TODO to allow precision improvement without making it an error.

vogler commented 3 years ago

Although it might be too complicated to implement because we'd need to copy the input file and then modify it to add the comments purely based on CIL locations.

One needs to come up with a stable solution, but I don't think it'd be too hard. Could just use //@ to indicate these annotations, strip //@.* from the input, then append goblint's WARN X as //@ WARN X.

And it takes away the possibility for TODO to allow precision improvement without making it an error.

Could just leave TODO in place, then have a sep. diff to detect those improvements (just as output, not as an error).