setjmp()/longjmp() control-flow

Quuxplusone commented 14 years ago


Bugzilla Link	PR5305
Status	NEW
Importance	P enhancement
Reported by	Peter Bengtsson (tesla@och.nu)
Reported on	2009-10-26 07:02:23 -0700
Last modified on	2016-07-27 09:42:29 -0700
Version	2.6
Hardware	PC Linux
CC	geoffers+llvmbugs@gmail.com, gs_tor@yahoo.ca, jrose@belkadan.com, kremenek@apple.com, llvm-bugs@lists.llvm.org, mnemo@minimum.se, nbowler@draconx.ca, xu_zhong_xing@163.com
Fixed by commit(s)
Attachments	`bad.c` (152 bytes, text/x-csrc)
Blocks
Blocked by
See also

Created attachment 3700
Sample code.

In code like:

1: int main(void)
2: {
3: char *a;
4: jmp_buf jb;
5:
6: if(setjmp(jb)) {
7:  free(a);
8:  return(0);
A: }
B:
C: a=malloc(1);
D: longjmp(jb,1);
E: }

The static analyzer complains both of a dead assignment
to "a" (line C) and that "a" is undefined (line 7).

It would be nice if it could handle this. (in the cases where
it can see both the setjmp() and longjmp(), or perhaps using
annotations to declare functions as possibly longjmp():ing)

Quuxplusone commented 14 years ago

Attached bad.c (152 bytes, text/x-csrc): Sample code.

Quuxplusone commented 14 years ago

Both warnings make use of Clang's source-level CFG, which has no special handling of setjmp/longjmp.

Quuxplusone commented 14 years ago

(In reply to comment #1)
> Both warnings make use of Clang's source-level CFG, which has no special
> handling of setjmp/longjmp.
>

I should add that this is something we'd like to do in the CFG.

Quuxplusone commented 14 years ago

Maybe I'm missing something, but isn't the complaint basically legitimate (even
if arrived at by accident..)? The usual caveat for setjmp/longjmp is that
locals modified after setjmp returns 0 may or may not be restored by longjmp,
so the 'free(a)' is not valid, and the assignment to 'a' can be considered
dead, by lacking valid uses.

But of course, tracing the flow properly would allow better diagnostics, and
detection of other cases; if you wrote char*a=NULL; then the code would still
be improper but the warning about 'a' not being initialized would not be issued.

In general, however, longjmp is called in a different function; as you say it
would be good to be able to annotate functions as possibly longjmping - it's
probably more practical to identify functions that won't longjmp, since the
compiler needs to assume a function can longjmp if there's no information to
the contrary.

This case where (a) longjmp occurs in same function as setjmp; and (b) compiler
could prove that the jmp_buf has not been modified by other calls in between;
is easily constructed in an example, but doesn't seem to be a practical use of
setjmp/longjmp. So, special treatment of this case doesn't seem justified IMO.
There are practical cases, I guess, where longjmp would go to a setjmp in the
same function, but longjmps to the same place from other (called) functions
would also exist (otherwise, use a 'goto').

In practice, the answer to this may be to identify a convention for the use of
local variables in functions containing setjmp, and produce warnings when it's
not followed, e.g:
   - locals modified prior to setjmp() may be read, but not modified, in code executed after setjmp returns 0.
   - locals modified after setjmp returns 0 may not be in read in code executed after setjmp returns 1.

This is the approach I follow, it would be nice to have it checked.
Maybe a bit messy for the general situation; it's unfortunate there's no
explicit 'scope' for a setjmp, after which you know a longjmp can't occur any
more - other than the entirety of the function of course. tagging non-
longjmping functions would allow the actual scope to be narrowed.

And I'm pretty sure it's generally unsafe to setjmp inside a {} containing
local vars defs, then after exiting the block, call a function which longjmps
back into the block -- but does that mean the compiler can assume that
functions after the block won't longjmp?

I find that setjmp/longjump, though widely shunned, can solve problems that are
quite difficult to solve in other ways; but they require due care in
application. It would be great if there were improvements in static analysis of
their use, even if a few additional restrictions were needed to enable this.

Quuxplusone commented 14 years ago

(In reply to comment #3)

> In general, however, longjmp is called in a different function; as you say it
> would be good to be able to annotate functions as possibly longjmping - it's
> probably more practical to identify functions that won't longjmp, since the
> compiler needs to assume a function can longjmp if there's no information to
> the contrary.

Modeling setjmp/longjmp in the CFG will also be useful for inter-procedural
analysis, which is actively being worked on.  From my perspective, modeling
setjmp/longjmp is along the same lines of modeling exception edges in the CFG
(which we optionally do for C++).

FWIW, I haven't seen many false positives related to not explicitly modeling
setjmp/longjmp.

Quuxplusone commented 14 years ago

(In reply to comment #3)

> And I'm pretty sure it's generally unsafe to setjmp inside a {} containing
> local vars defs, then after exiting the block, call a function which longjmps
> back into the block -- but does that mean the compiler can assume that
> functions after the block won't longjmp?

Modeling setjmp/longjmp in the CFG will not be used for compiler correctness
(of codegen), but it will impact analysis-based warnings in the compiler and
the static analyzer.

Depending on how much information the analyzer has there are likely reasonable
heuristics on when to emit a warning when one potentially does a longjmp into a
destroyed scope.  With inter-procedural analysis, we may be able to tell that a
called function does a longjmp, and thus emit a warning for such cases.
Without such information it is unlikely the analyzer could emit a warning with
a high signal-to-noise ratio.

Quuxplusone commented 8 years ago

(In reply to comment #3)
> Maybe I'm missing something, but isn't the complaint basically legitimate
> (even if arrived at by accident..)? The usual caveat for setjmp/longjmp is
> that locals modified after setjmp returns 0 may or may not be restored by
> longjmp, so the 'free(a)' is not valid, and the assignment to 'a' can be
> considered dead, by lacking valid uses.

Well, the language standard says what happens to variables on a longjmp.
However, the test case does have a problem, and its behaviour is undefined.
The specific language rule that applies is this one (C11 7.13.2.1p3):

  'all accessible objects have values ... as of the time the longjmp function
   was called ... except that the values of objects of automatic storage
   duration that are local to the function containing the invocation of the
   corresponding setjmp macro that do not have volatile-qualified type and
   have been changed between the setjmp invocation and the longjmp call are
   indeterminate'.

The behaviour of the test case is undefined because 'a' meets all the criteria
above, so its value is indeterminate after the longjmp, and it is subsequently
evaluated.

But the bug reported is nevertheless valid.  We can easily correct the test
case by adding a volatile qualifier to the declaration of a:

  char * volatile a;

and the static analyzer still complains about the uninitialized value (seems
that volatile suppresses dead store warnings, at least in the latest versions).

> And I'm pretty sure it's generally unsafe to setjmp inside a {} containing
> local vars defs, then after exiting the block, call a function which
> longjmps back into the block

Unless the block contains declarations of identifiers with variably modified
type, such jumps are generally allowed (although it is not allowed to longjmp
into functions that have returned).

Quuxplusone / LLVMBugzillaTest

setjmp()/longjmp() control-flow #5801