Open vogler opened 3 years ago
Context: https://github.com/goblint/analyzer/pull/286#issuecomment-879788192
I only realized this when I used regtest (which awfully does similar expected comparisons within Goblint...) and saw different output.
I agree with 'awful' concerning duplication, but IMO it'd be better for goblint to handle these things instead of some ruby script - ideally we'd get rid of it in favor of an OCaml executable or some minimal shell script (parallel
) with goblint doing most of the work.
https://github.com/goblint/analyzer/pull/286#issuecomment-879848994
Also, once could use the syntactic search to identify those places where
__goblint_check(...)
is called that are unreachable and then also warn there that the assert is unreachable. (These are the cases where warnings are missing in./regtest.sh
vs the ruby script).
Good point to keep in mind.
This (together with the rest?) could be done as post-processing in finalize
of the assert-analysis.
using something along the lines of
__goblint_check(a == 42, __GOBLINT_UNKNOWN)
or__goblint_check(x ==17, __GOBLINT_SUCCESS)
in ehe regression tests instead of having anassert(...) //UNKNOWN
.
Pro:
Con:
assert
is easily understood and portable (assert.h
is available on every system)https://github.com/goblint/analyzer/pull/286#issuecomment-879851445
Tbh, I don't really like the idea of this regression testing check (
dbg.regression
for regtest) being inside Goblint itself at all. It's ideologically wrong to have the testing logic right in the middle of Goblint (which would be both the tester and the testee simultaneously) instead of being independent. And having to write__goblint_check(x == 17, __GOBLINT_SUCCESS)
is so much more verbose than a standardassert(x == 17)
.But that's all beside the point for this issue of update_suite not reporting possible unsoundness cases. If we want to discuss completely redoing the regression testing architecture, it's best to have another issue for that. And that won't solve the fact that we still have ambiguous uses of
UNKNOWN
.
Tbh, I don't really like the idea of this regression testing check (
dbg.regression
for regtest) being inside Goblint itself at all. It's ideologically wrong to have the testing logic right in the middle of Goblint (which would be both the tester and the testee simultaneously) instead of being independent.
In principle yes, but the case here (at least at the moment) is just comparing strings, so it shouldn't be a problem. I think it's nice to be able to run goblint on a file and let it check the assertions. If you want that feature, the rest is not much more on top.
And having to write
__goblint_check(x == 17, __GOBLINT_SUCCESS)
is so much more verbose than a standardassert(x == 17)
.
Agree.
dune runtest
/dune promote
is nice, but it's also nice to have each test contain the expected result.
https://dune.readthedocs.io/en/stable/tests.html#diffing-the-result
Maybe goblint (or some script) could annotate a file with comments for each assert/warning which could then be used for diff
/promote
.
That's an interesting idea because it could be used to test for arbitrary warnings (or the lack thereof, which currently must be manually annotated by NOWARN
). And it would be testing the actual Goblint output for assertions, just like an user would see on actual programs.
Although it might be too complicated to implement because we'd need to copy the input file and then modify it to add the comments purely based on CIL location
s. And it takes away the possibility for TODO
to allow precision improvement without making it an error.
Although it might be too complicated to implement because we'd need to copy the input file and then modify it to add the comments purely based on CIL
location
s.
One needs to come up with a stable solution, but I don't think it'd be too hard.
Could just use //@
to indicate these annotations, strip //@.*
from the input, then append goblint's WARN X
as //@ WARN X
.
And it takes away the possibility for
TODO
to allow precision improvement without making it an error.
Could just leave TODO
in place, then have a sep. diff to detect those improvements (just as output, not as an error).
Problems with the current solution (update_suite.rb):
gem install parallel
dune runtest
it could only re-run only things that changed (also it could keep preprocessed input, syntax-highlighted source etc.)Original comment for this issue was regarding the last point.
@jerhard and I were tossing this back and forth over lunch and something we discussed at some point was using something along the lines of
__goblint_check(a == 42, __GOBLINT_UNKNOWN)
or__goblint_check(x ==17, __GOBLINT_SUCCESS)
in the regression tests instead of having anassert(...) //UNKNOWN
.Then, the expected result is immediately obvious to Goblint itself, and Goblint can either output things for all asserts or only those where something failed without having to somehow do a regex on the comments.
Also, once could use the syntactic search to identify those places where
__goblint_check(...)
is called that are unreachable and then also warn there that the assert is unreachable. (These are the cases where warnings are missing in./regtest.sh
vs the ruby script).The question that is still open is how to integrate warnings different from asserts into this setting.
Opinions?
Originally posted by @michael-schwarz in https://github.com/goblint/analyzer/issues/286#issuecomment-879796205