Destination region writable

mkschreder commented 1 year ago

What is the meaning of "destination region writable" and how do I fix it in code?

Here is a very simple matrix addition function:

/*
 * C = A + B
 */
void add(float *C, const float *const A, const float *const B, uint16_t row, uint16_t column)
{
    float Cr[row * column];

    for (uint16_t i = 0; i < row; i++) {
        for (uint16_t j = 0; j < column; j++) {
            Cr[i * column + j] = A[i * column + j] + B[i * column + j];
        }
    }

    memcpy(C, Cr, sizeof(float) * row * column);
}

cbmc -Iinclude --function add --unwind 5 src/linalg/add.c

src/linalg/add.c function add
[add.assertion.1] line 21 assertion C: FAILURE
[add.precondition_instance.1] line 28 memcpy src/dst overlap: SUCCESS
[add.precondition_instance.2] line 28 memcpy source region readable: SUCCESS
[add.precondition_instance.3] line 28 memcpy destination region writeable: FAILURE

How can I express the requirement that C must always be large enough to contain the result?

mkschreder commented 1 year ago

I'm trying to use the function contracts but still without success:

void add(float *C, const float *const A, const float *const B, uint16_t row, uint16_t column)
    __CPROVER_requires(__CPROVER_is_fresh(C, sizeof(float)*row*column))
    __CPROVER_assigns(C: __CPROVER_object_upto(C, row * column))
{
    float Cr[row * column];

    for (uint16_t i = 0; i < row; i++) {
        for (uint16_t j = 0; j < column; j++) {
            Cr[i * column + j] = A[i * column + j] + B[i * column + j];
        }
    }
    memcpy(C, Cr, sizeof(float) * row * column);
}

Condition: is_loop_free(function_body, ns, log)                                                                                                                                                                                                                                                                              
Reason: Loops remain in function 'add', assigns clause checking instrumentation cannot be applied.

TGWDB commented 1 year ago

Tagging @remi-delmas-3000 as better with the contracts details than me. But without some information in your first example there is no way to know for sure that C is suitably wriatble. My guess (that works locally) would be to inform cbmc that C is suitably writable with something like

__CPROVER_assume(__CPROVER_w_ok(C, sizeof(float) * row * column));

That is, tell cbmc to assume that C can be written for the size of the data we expect. However, someone more informed on the contracts work should probably also check this.

remi-delmas-3000 commented 1 year ago

Hi, @mkschreder if you want to analyse a function that manipulates pointers, you will have to write a proof harness that allocates the memory objects that the function consumes like so:

int main_bounded() {
    uint16_t row = 5;
    uint16_t column = 5;
    float *C = malloc(sizeof(float) * size);
    float *A = malloc(sizeof(float) * size);
    float *B = malloc(sizeof(float) * size);
    add(C, A, B, row, column);
}

The reason for that is that CBMC only knows about objects explicitly allocated by the program. This analysis will be bounded to the size of inputs and loops fully unwound, this will not scale if inputs grow large.

To do an unbounded analysis using contracts you have to specify the memory footprint of each argument, and the footprint of the function, the footprint of the loops and loop invariants and variants (for termination proof).

/*
 * C = A + B
 */

void add(float *C, const float *const A, const float *const B, uint16_t row, uint16_t column, size_t size)
    // clang-format off
__CPROVER_assigns(__CPROVER_object_whole(C))
__CPROVER_requires(size == (size_t)row * (size_t)column)
__CPROVER_requires(__CPROVER_is_fresh(A, sizeof(float) * size))
__CPROVER_requires(__CPROVER_is_fresh(B, sizeof(float) * size))
__CPROVER_requires(__CPROVER_is_fresh(C, sizeof(float) * size))
// clang-format on
{
    float Cr[row * column];

    for (uint16_t i = 0; i < row; i++)
        // clang-format off
    __CPROVER_assigns(i, __CPROVER_object_whole(Cr))
    __CPROVER_loop_invariant(0<=i && i<= row)
    __CPROVER_decreases(row - i)
        // clang-format on
        {
            for (uint16_t j = 0; j < column; j++)
                // clang-format off
        __CPROVER_assigns(j, __CPROVER_object_whole(Cr))
        __CPROVER_loop_invariant(0 <= j && j<= column)
        __CPROVER_decreases(column - j)
                // clang-format on
                {
                    Cr[i * column + j] = A[i * column + j] + B[i * column + j];
                }
        }

    memcpy(C, Cr, sizeof(float) * row * column);
}

I added the extra size input to avoid repeating multiplication operations in the contract and the body of the function (think of it as ghost code)

Then you would write the following harness:

int main() {
    float *C;
    float *A;
    float *B;
    uint16_t row;
    uint16_t column;
    size_t size =  (size_t)row * (size_t)column;
    add(C, A, B, row, column ,size);
    return 0;
}

and invoke the tools like so:

goto-cc --function main.c -o a.out
goto-instrument --apply-loop-contracts a.out b.out
goto-instrument --enforce-contract add b.out c.out
cbmc --pointer-check --bounds-check --signed-overflow-check --unsigned-overflow-check --conversion-check --stop-on-fail c.out

This already reveals possible overflows.

However even with the overflow fixed I don't think the proof will finish in reasonable time due to the presence of repeated multiplications which are known to be hard to reason about for SAT solvers.

The function will need some refactoring to share common intermediate product and array index expressions.

void add(float *C, const float *const A, const float *const B, uint16_t row, uint16_t column, size_t size)
    // clang-format off
__CPROVER_assigns(__CPROVER_object_whole(C))
__CPROVER_requires(size == (size_t)row * (size_t)column)
__CPROVER_requires(__CPROVER_is_fresh(A, sizeof(float) * size))
__CPROVER_requires(__CPROVER_is_fresh(B, sizeof(float) * size))
__CPROVER_requires(__CPROVER_is_fresh(C, sizeof(float) * size))
// clang-format on
{
    float Cr[size];

    for (uint16_t i = 0; i < row; i++)
        // clang-format off
    __CPROVER_assigns(i, __CPROVER_object_whole(Cr))
    __CPROVER_loop_invariant(0<=i && i<= row)
    __CPROVER_decreases(row - i)
        // clang-format on
        {
            for (uint16_t j = 0; j < column; j++)
                // clang-format off
        __CPROVER_assigns(j, __CPROVER_object_whole(Cr))
        __CPROVER_loop_invariant(0 <= j && j<= column)
        __CPROVER_decreases(column - j)
                // clang-format on
                {
                    size_t dest = (size_t)i * (size_t)column + (size_t)j;
                    Cr[dest] = A[dest] + B[dest];
                }
        }

    memcpy(C, Cr, sizeof(float) * size);
}

To try and make things faster you could install the kissat sat solver in your PATH and add this CBMC switch but it likely will not make a big difference

 --external-sat-solver kissat

remi-delmas-3000 commented 1 year ago

So the property to prove for memory safety of 2D array lookups in A, B and C is just this:

int main()
{
    uint16_t row;
    uint16_t column;
    uint16_t i;
    uint16_t j;
    if (i < row && j < column)
    {
        __CPROVER_assert((size_t)i * (size_t)column + (size_t)j < (size_t)row * (size_t)column, "access in range");
    }
    return 0;
}

And this is already too hard Minisat, Cadical, Kissat or Z3 (using cbmc --incremental-smt2-solver 'z3 -smt2 -in' main.c)

diffblue / cbmc

Destination region writable #7941