Closed viktormalik closed 1 year ago
@FrNecas you might want to have a look.
@viktormalik, is this ready to merge?
Ready now. We should do a release after the merge.
Ready now. We should do a release after the merge.
Can you please create a PR that bumps the version numbers?
Introduce a new abstract domain for reasoning about contents of arrays.
Template form
Each (loop-back) array is split into multiple segments. All array items in a segment are abstracted with a single segment element variable for which we compute an invariant in one of the basic domains, based on the array elements type. Currently, interval and heap domains are supported for the inner domain.
Let us have an array variable
a
and two segment border indicesi#1
,i#2
. If the indices can be ordered (i.e.i#1 < i#2
), we create a segmentation:If the indices cannot be ordered, we create two segmentations:
Template rows
All array elements from a segment
seg#i
are abstracted using a segment index variableidx#i
and a segment element variableelem#i = a#lb[idx#i]
. We let an inner domain to compute an invariant overelem#i
using the domain's standard template, whose template row guards are extended with additional segment bounds. A template row for a segment{idx#low}seg#i{idx#high}
has the form:where
T
is a template row overelem#i
in the inner domain, e.g.elem#i <= CONST
.Choice of segment border indices
At the moment, we use two sources for the segment borders:
The list of written indices is determined using the array index analysis which collects expressions occurring on the LHS (written) and RHS (read) of array accesses.
Invariant projection
In order to be able to use the invariants computed over the segment elements variables, we project the invariants onto all read indices of the given array (the set of read indices is determined using the mentioned array index analysis). The projection is done by copying the relevant part of the template and replacing the segment index variable by the read index. This makes sure that the constraint expressed by the invariant is used for the read index.
Combination with zones
Besides simple domains for a single variable (intervals, heap), the array domain can be combined with the zones domain. If both domains are used, a specific differences are created:
Computation of array invariants takes considerably longer compared to other domains. The main reason is that there's a lot of constraints that the solver must count with (segment border constraints, invariant projection, ...). To make the template smaller, this introduces an expression dependence analysis which determines which expressions may depend (in a data pr control) way on each other. Then, only differences of potentially dependent expressions are created inside the array-zones combination.
Resolves #106.
Many currently handled programs are added as regressions tests. The test suite and clang-format should pass after every commit.
An important remaining feature is generation of witnesses from array invariants but I'd submit that as a separate PR.