claudemarche commented 2 years ago

In short: I propose that \valid(p) and \valid_read(p) include in particular \initialized(p).

This issue is related to issues #74 and #81.

In much more details: historically, the predicate \valid(p) is inherited from Caduceus. Its informal meaning, or informal intention, was to be the natural pre-condition for accessing to the value pointed by p, either by reading (*p) or writing (*p = ...). It was included as is in ACSL from the beginning. Later on, the variant \valid_read(p) was introduced, with the intention to be the natural pre-condition for access but not allowing writing, which is the typical situation when the pointer points to const data. Much later on, the predicate \initialized was introduced because of a need to distinguish between data that is allocated but not yet initialized, and initialized data. reading non-initialized data is not forbidden by the C norm, but the result is unspecified, and in particular there is no guarantee that reading twice the same non-initialized memory location will give the same value.

The main problem is that in the majority of examples that exist so far, e.g. in the ACSL by example document, Allan Blanchard tutorial, or the gallery of verified programs of Toccata, the \initialized predicate is not used, and \valid is used as it was meaning \valid(p) && \initialized. In a strict interpretation of \valid all these examples are in fact incorrect, even if they were proved correct by Jessie or by WP.

This annoying situation is hot nowadays, not only because of the issues #74 and #81, but also because of the writing of the ACSL chapter of the Frama-C book, in which a clear meaning for \valid is expected.

The proposal is then:

to rename the current predicate \valid and \valid_read into new names such as \write_permitted and \read_permitted.
to give to \valid(p) the new meaning \write_permitted(p) && \initialized(p) and to valid_read(p) the meaning \read_permitted(p) && \initialized(p).

Pros

The semantics of ACSL would be clarified
The vast majority of examples in circulation would remain correct

Cons

The EVA plugin should be updated with a renaming of \valid into \write_permitted, and \valid_read into read_permitted. I assumed it should be only a renaming, feel free to correct me.
The RTE plugin should be audited to check if it generates the appropriate alarm. The alarm for memory writing should be \write_permitted, whereas for memory access it should probably remain \valid_read.
The WP plugin should be audited to, though I don't know what should be changed.
The contracts for the libc should be reviewed, it is likely that a lot of \initialized could be removed because now subsumed by a \valid.

This was discusses longly this morning with Virgile and Allan, I probably forgot other arguments discussed, let's what comes in the upcoming discussion. Any comments welcome! @all

AllanBlanchard commented 2 years ago

I spent some time to think about all of this, and I am afraid that it may not be as simple as we expected. In particular:

Compatibility

The vast majority of examples in circulation would remain correct

I am not sure of this anymore, because of the assigns clause. For example:

struct X { int x; };

/*@ requires \valid(x);
  @ assigns *x ; */
void function(struct X* x);

int main(void){
  struct X x = { 0 };
  function(&x);
  //@ assert A: \valid(&x);
}

Here, everyone expects &x to remain valid after calling function, thus to prove A. But with the new semantics, it might not be the case because *x is assigned (and since it is a structure it is possible to "uninitialize" it). And the reason is that \valid(p) now talks about both p and the initialization status of the pointed value (thus a property of *p). Thus an assigns on *p might change a property related to p and I do not know if I like that.

Thus for a quite a lot of existing contracts, we should now specify in post-condition that some pointers remain \valid after the execution of the function.

Impact on equality, maybe not that bad

(for now let us say that reading an uninitialized value is undefined behavior)

The main problem is that in the majority of examples that exist so far, e.g. in the ACSL by example document, Allan Blanchard tutorial, or the gallery of verified programs of Toccata, the \initialized predicate is not used, and \valid is used as it was meaning \valid(p) && \initialized. In a strict interpretation of \valid all these examples are in fact incorrect, even if they were proved correct by Jessie or by WP.

In fact, it may not be a problem as long as analysis hypotheses are clear. For me it tends to say that we can treat functional correctness and absence of runtime errors separately. Let us see for example the swap function:

/*@ ensures *a == \old(*b) && *b == \old(*a); */
void swap(int* a, int* b){
  int tmp = *a ;
  *a = *b ;
  *b = tmp ;
}

For me, without proving absence of runtime errors, this ensures must be verified correct by a deductive verification tool. But for now the absence of runtime errors has not been verified. The property is true with the hypothesis that the program does not produce any runtime error. Thus, here, we assume that:

memory locations are writable,
the pointed memory is initialized.

If one now needs to be sure that the program does the right job according to C, we have to add the verification of absence of runtime errors. Thus, with the following requires:

/*@ requires \valid(a) && \valid(b) ;

We know that the program is verified, still assuming that:

the pointed memory is initialized.

Thus current examples aren't incorrect, they are verified assuming the hypothesis.

Quite a lot of predicates are in fact just there to explicit hypotheses that were made in the past. This is somewhat the case of object_pointer. That could be the case for a potential \aligned_access or \right_object_type (strict aliasing), etc. Adding these predicates would not make examples incorrect: their existence just highlight existing hypotheses.

Because in fact the example assumes that:

the pointed memory is readable/writable
initialized
pointers are aligned
they are indeed pointers to int objects
... probably other things that I am not aware of in the ISO C.

Thus for me the question is more on how we should consider unspecified behaviors/values. And I tend to say that a C value in ACSL annotations is equivalent to its actual value in the program if this value is not an indeterminate value. And I believe that it is indeed ab assumption that all analyzers have.

claudemarche commented 2 years ago

Concerning you compatibility point: I believe my proposal should be refined: my naive interpretation of \valid_read(p) when p is of type struct S * is that it is permitted to read and that all fields are initialized. So to be precise, my proposal should not be that \valid_read(p) means \read_permitted(p) && \initialized(p) but \read_permitted(p) && \initialized(&(p->f1)) && ... && \initialized(&(p->fk)) enumerating all fields of S.

But that's a minor technical point.

claudemarche commented 2 years ago

Concerning the other point: so your proposal means to me that the conformity of a code w.r.t. an ACSL specification should not include the safety of its executions. In other words, I should write in the ACSL chapter that a function conforms to its contract when for any program state S1 satisfying the pre-condition, for any safe execution of its body starting from state S1, and that terminates, the resulting state S2 satisfy the post-condition.

That's quite a weak definition to me, but that's a meaningful choice to make.

One funny consequence though, is that we don't need to put anymore the \valid_read or \valid preconditions we usually state. Disturbing at least. That would need to clearly say proving conformity of the code to a contract and proving safety of execution are two clearly distinguished objectives.

I'll need to think more about it, but it makes sense to me.

AllanBlanchard commented 2 years ago

Concerning you compatibility point: I believe my proposal should be refined: my naive interpretation of \valid_read(p) when p is of type struct S * is that it is permitted to read and that all fields are initialized. So to be precise [...]

I am not sure that I understand. Here:

struct X { int field ; };
/*@ requires \valid(x); 
  @ assigns *x ; */
void foo(struct X* x){
  struct X tmp[1] ;
  *x = tmp ;
}

There is no RTE, still *x is not "initialized" anymore (nor x->field that now holds an indeterminate value). Thus with the proposed semantics, x would not be valid (nor valid read) anymore.

pbaudin commented 2 years ago

In other words, I should write in the ACSL chapter that a function conforms to its contract when for any program state S1 satisfying the pre-condition, for any safe execution of its body starting from state S1, and that terminates, the resulting state S2 satisfy the post-condition.

That's quite a weak definition to me, but that's a meaningful choice to make.

It is the definition of the Weakest Liberal Precondition and I don't see a better choice.

But for the terminates clause, I can't accept the following result: the C function complies with the terminates clause of its contract if the execution of that C function terminates. I would like to prove a save termination.

claudemarche commented 2 years ago

There is no RTE, still *x is not "initialized" anymore (nor x->field that now holds an indeterminate value). Thus with the proposed semantics, x would not be valid (nor valid read) anymore.

OK, I missed your point. only \write_permitted would be preserved, not \valid.

claudemarche commented 2 years ago

It is the definition of the Weakest Liberal Precondition and I don't see a better choice.

It is indeed the WLP. But there is a better choice to me, which is the blocking semantics. WLP is an old defintion that is not well-suited for defining the validity of code assertions. For example in a code like

int f() {
  ... some code ...
  //@ assert P;
  while (1) {};
}

the WLP is unable to define what would it mean for the assertion to be valid.

So for me a better definition is : in a state where the precondition is valid, the body executes without blocking (which implies that any code annotation met is valid) and if it terminates the post is valid (which is indeed also covered by "non blocking")

The subtility that I didn't want to consider is the the distinction between blocking on a code annotation and blocking on a C statement (like division by zero of non-permitted memory access). I believe I'll need to distinguish those in my part of the ACSL chapter.

claudemarche commented 2 years ago

Summary : I understand that my proposal goes to far since it would require, for a code to be conforming with its annotations, to be free of run-time errors and undefined behaviors. Although this would be an ideal situation, and is usually what one proves when proving code in Java, Ada, Rust, etc., it is too much to ask for C code.

I still believe that the names \valid or \valid_reads are poor choice names, something like \write_permitted and \read_pernitted would have been better, but difficult to change.

I'll close this ticket.

acsl-language / acsl

Proposal for meaning of `\valid`, `\valid_read` w.r.t. `\initialized` #83

Pros

Cons

Compatibility

Impact on equality, maybe not that bad