acsl-language / acsl

Sources for the ANSI/ISO C Specification Language manual
Other
47 stars 8 forks source link

Exact program state of Pre and Post states #70

Open vprevosto opened 3 years ago

vprevosto commented 3 years ago

ACSL manual states that (section 2.3.1)

C function parameters are obtained by value from actual parameters that mostly remain unaltered by the function calls. For that reason, formal parameters in function contracts are defined such that they always refer implicitly to their values interpreted in the pre-state.

However, this leaves out some important questions, notably whether we can speak about the address of such formal parameters. Basically, given a function rettype f(prms) { body; return r; }, a call lval = f(args) can be decomposed as such (we assume that the body of f is normalized and there's a single return statement)

EDIT as mentioned in a comment below, the initial // Post labels were incorrectly placed.

// Pre1 
{ 
   rettype retvar;
// Pre2
  { 
     prms = args;
// Pre3
    {
        body;
        retvar = r;
-// Post1
    }
+// Post1
-// Post2
  }
+//Post2
 lval = retvar;
-// Post3
 }
+//Post3

In this context, there are three places where the Pre (as in \at(.,Pre)) state of f could be located. Pre2 is not really useful: it only give access to the location where the returned value will be temporarily placed, and by definition retvar is uninitialized until the Post state. Equating Pre with Pre1 implies that we cannot take the address of the formals in the pre state, and notably that it is not possible to state something like \initialized(&prm). Basically, the requires of f would be translated at Pre1 as assert R{prms <- args} where {prms <- args} denotes the substitution of prms by args. Equating Pre with Pre3 implies that we have addresses attached to the formals.

Similarly, equating Post with Post1 implies that we can speak about the addresses of the formals, and of retvar. In particular, if p holds the address of a formal, ensures \valid(p); will be true. This is not the case if we equate Post with Post2 or Post3, but in the former case it might be possible to refer to the adress of \result (so that it would be possible to state ensures \initialized(&\result), e.g. if we return a struct and want to state explicitly that each field has been initialized. Again, the ensures of f would be translated as assert E{prms<-args, \result<- retvar} (or \result <- lval for Post3).

At a quick glance, it would seem that equating Pre with Pre3 and Post with Post2 or Post3 would be preferable, but there may be some other subtleties than \valid and \initialized lying around.

AllanBlanchard commented 3 years ago

Related to https://github.com/acsl-language/acsl/issues/49

correnson commented 3 years ago

I have the feeling that another program point is important, which is before lval = retval and after closing the block that allocated the formals. I have the feeling that the real choice for Post is before affecting the target lval which is actually unknown when proving the function in a modular setting.

What distinguish this point from Post2 is the allocation of formals, although \result still have an address.

// Pre1 
{ 
   rettype retvar;
// Pre2
  { 
     prms = args;
// Pre3
    {
        body;
        retvar = r;
// Post1
    }
// Post2
  }
// Post2bis
 lval = retvar;
// Post 3
 }
vprevosto commented 3 years ago

Strictly in terms of what is allocated, Post2bis and Post3 are equivalent (they're in the exact same block). But indeed it makes more sense to put the label right after closing the inner block.

vprevosto commented 3 years ago

that said, I think there's a typo in the code with respect to the position of the Post* labels: in the current setting, in both Post1 and Post2 everything is allocated, while in Post3 retvar is still allocated. The original code template should read:

// Pre1 
{ 
   rettype retvar;
// Pre2
  { 
     prms = args;
// Pre3
    {
        body;
        retvar = r;
    }
// Post1
  }
// Post2
 lval = retvar;
}
// Post3
correnson commented 3 years ago

Nice !

However, I'm afraid that, even if you place Post at Post3 (in either snippets) instead of Post2 or Post2bis, you shall not replace \result with lval in the post-conditions : what if lval is aliased in the post ?

//@ ensures \result == \old(*p) && *p == 0; assigns *p;
int f(int *p) { int v = *p ; *p = 0; return v; }
}

void main(void) {
    int x = 1;
 A: x = f(&x);
    //@ check Subst: x == \at(*&x,A) && *&x == 0;
}

And more generally, if you put Place at Post3, I don't know how you can prouve f in a modular way.

vprevosto commented 3 years ago

Indeed, we can't do this substitution. On the other hand, this is strictly a problem of the caller. Hence, I do not understand the issue with f. With main, on the other hand, we indeed have a problem. But I'm afraid it is in fact deeply ingrained in the C standard, and more precisely 6.5.2.2§10:

There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call. Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function

There's no sequence point in main between A and the ; after the call to f. It could thus be argued that a compiler smart enough to detect that f will return the value contained in x could choose to make the assignment in main before the call to f, potentially optimize it away since it would amount to x=x, and only after that perform the side effects of f itself, putting 0 into x.

correnson commented 3 years ago

Yes indeed, there is actually an undefined sequence hidden behind the scene with a temporary local variable to hold the result. Something like the following pseudo-code (assuming Post is in Post2):

int tmp ;
__undefined_sequence__ {
    tmp = f(prms);
    //@ assert Ensures( args <- prms , \result <- tmp );
    lval = tmp;
}

Regarding the issue with f, I just mean that if Post is placed in Post3, when proving the ensures of f you can not figure out what \result would be unless you precisely know what lval is, hence you can not perform modular proofs.

vprevosto commented 3 years ago

Actually it's not "undefined" (in the dreaded "undefined behavior" :fearful: sense), but "indeterminately sequenced", meaning that after the ; we can only end up in one of the two situations, either x == \at(x,A) or x==0.

But this is still an issue of course 😁

pbaudin commented 3 years ago

In 3 of your previous code schemes, there is the assigment prms = args;. But, in order to remove any ambiguity, where do you place the declaration prmstype prms; ?

vprevosto commented 3 years ago

at the same place (it was meant to represent an initialization rather than a simple assignment).

correnson commented 3 years ago

A short example illustrating most of the problems to be discussed :

/*@
   requires InValid: \valid(&p);
   requires InInit:  \initialized(&p);
   ensures OutValid: \valid(&p);
   ensures OutInit:  \initialized(&p);
   ensures RetValid: \valid(\result);
   ensures RetInit:  \initialized(\result);
*/
struct S *callee(struct S p) { return &p; }

void caller(struct S p) {
   struct S * r ;
   r = callee(p);
   //@ assert WRONG: \valid(r) && \initialized(r);
}
vprevosto commented 3 years ago

If we agree on Post meaning Post2 (in the sense of this comment), OutValid and RetValid do not hold (I won't commit to the status of OutInit and RetInit, as the initialization status of an invalid location is dubious at best).

vprevosto commented 2 years ago

@claudemarche for information

pbaudin commented 2 years ago

I'm not a great fan of that choice because the ensuresclause wouldn't give an external view of the specified function. How can you specifyfree` function ? Did I miss something?

claudemarche commented 2 years ago

I guess the reason why @vprevosto notifies me this thread is the oral discussion we had last Wednesday about the ACSL chapter of the FC book, where the understanding of the meaning of ACSL specifications is a key. I'm planning to give a semi-formal explanation of the intended meaning of the validity of a contract (in terms of a blocking semantics a la Paolo Herms), that should at least say some non trivial and important things about the parameters of a function. In essence, one should have in mind that the parameters have a meaning in the pre-state of a call, but not in the post-state. This choice seems good to me, and is consistent that even if such a parameter is mutated in the function's body, this mutation is not visible to the caller, and the "final" value is inaccessible to it. It also explains well why the kernel adds \old around all parameters when they are mentioned in post-conditions.

Your example adds some difficulty due to the dreadful address operator (the general advice being: use it only when you are sure you know what you are doing...). Taking the address of a parameter means that the allocation and deallocation of the memory cells where the parameter values are stored become "visible" in the ACSL specifications... Too bad for the good old principle of abstraction, sigh... In that case it seems to me that we should say that the pre is supposed to hold after allocation of the parameters, and the post is supposed to hold after their deallocation. I'm not sure your presentation as a ``program transformation'' is really appropriate but let me try:

rettype retvar;   // allocates retvar
... prms; // allocates parameters 
prms = args; // initialize parameters
// pre holds here I think
body;
lval = retvar;
// deallocates retvar and prms
// post holds here I think

so in an example like

//@ ensures \valid(\result);
struct S *callee(struct S p) { 
struct S *tmp = &p;
//@ assert \valid(tmp);
return tmp; 
}

the assert is valid but not the ensures.

Hope it is quite clear and that it may helps...

pbaudin commented 2 years ago

Thank @claudemarche, I now see what I missed; the proposed choice is Indeed a good one.