incorrect (?) transformation around icmp

Quuxplusone commented 4 years ago


Bugzilla Link	PR46380
Status	NEW
Importance	P enhancement
Reported by	Ralf Jung (post+llvm@ralfj.de)
Reported on	2020-06-18 06:34:44 -0700
Last modified on	2020-09-11 17:43:59 -0700
Version	trunk
Hardware	PC Linux
CC	comexk@gmail.com, efriedma@quicinc.com, hfinkel@anl.gov, James.Dutton@gmail.com, jdoerfert@anl.gov, jeroen.dobbelaere@synopsys.com, juneyoung.lee@sf.snu.ac.kr, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, llvm@meinersbur.de, nunoplopes@sapo.pt, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also	PR47090

(Bugzilla made me pick a component, so I made a wild guess. I do not have the
slightest idea which of these internal LLVM components is responsible here.
Would be nice if I could select "unknown"...)

In https://bugs.llvm.org/show_bug.cgi?id=34548#c99, Eli Friedman wrote

> icmp is defined to just take the raw pointer bits as an integer.
> If some transform isn't consistent with this, please file a bug.

I think Juneyoung found a transformation that is indeed not consistent with
this, which I adjusted as follows (https://godbolt.org/z/XYQ7Vx):

define i1 @compare(i32* %p, i32* %q) {
  %c = icmp eq i32* %p, %q
  ret i1 %c
}
define void @src() {
  %p = alloca i32
  %q = alloca i32
  call void @llvm.lifetime.start.p0i32(i64 1, i32* %p)
  call void @llvm.lifetime.end.p0i32(i64 1, i32* %p)
  call void @llvm.lifetime.start.p0i32(i64 1, i32* %q)
  %c = call i1 @compare(i32* %p, i32* %q)
  br i1 %c, label %A, label %B
A: ; compare() == true
  call void @f(i1 true)
  %c2 = icmp eq i32* %p, %q
  call void @f(i1 %c2)
  br label %EXIT
B: ; compare() == false
  call void @f(i1 false)
  %c3 = icmp eq i32* %p, %q
  call void @f(i1 %c3)
  br label %EXIT
EXIT:
  call void @llvm.lifetime.end.p0i32(i64 1, i32* %q)
  ret void
}

The function "src" compares "p" and "q" twice, once inside "compare". It calls
"f" twice with the two results of the comparison. The first comparison is
passed via indirect information flow, i.e., the equivalent of "if p == q {
f(true) } else { f(false) }" in Rust. "p" and "q" could be equal or not, so
this function has two possible executions: either "f" gets called twice with
"true" as argument, or it gets called twice with "false" as argument.

The transformed program (with "opt -instsimplify") is

define i1 @compare(i32* %p, i32* %q) {
  %c = icmp eq i32* %p, %q
  ret i1 %c
}
define void @src() {
  %p = alloca i32, align 4
  %q = alloca i32, align 4
  call void @llvm.lifetime.start.p0i32(i64 1, i32* %p)
  call void @llvm.lifetime.end.p0i32(i64 1, i32* %p)
  call void @llvm.lifetime.start.p0i32(i64 1, i32* %q)
  %c = call i1 @compare(i32* %p, i32* %q)
  br i1 %c, label %A, label %B
A:                                                ; preds = %0
  call void @f(i1 true)
  call void @f(i1 false)
  br label %EXIT
B:                                                ; preds = %0
  call void @f(i1 false)
  call void @f(i1 false)
  br label %EXIT
EXIT:                                             ; preds = %B, %A
  call void @llvm.lifetime.end.p0i32(i64 1, i32* %q)
  ret void
}

Notice how in block A, "f" gets called with two *different* values, which
should be impossible because the original program only calls f with two times
the same value.

Quuxplusone commented 4 years ago

Since %p and %q are different allocas, they will have two different values,
including different integer representations (both alloca are 4 bytes).

InstCombine is making the deduction

%p = alloca i32, align 4
%q = alloca i32, align 4
%c2 = icmp eq i32* %p, %q

to

%c2 = false

because of this. It does not change the compare function because InstCombine is
not interprocedural.

The basic block A is dead code, it will never be executed. Dead code can be
arbitrarily wrong.

Quuxplusone commented 4 years ago

The alloca have disjoint lifetimes. That means LLVM may actually allocate them to the same stack slot.

Quuxplusone commented 4 years ago

I think the issue you're talking about is something like the following?

void z(long*);
int a() {
    long raddr1, raddr2;
    {
       long r; z(&r); raddr1 = (long)&r;
    }
    {
       long r; z(&r); raddr2 = (long)&r;
    }
    return raddr1 == raddr2;
}

clangs allocate the two versions of "r" at the same address, but folds the
compare to "false".

This isn't really an integer icmp vs. pointer icmp thing; it's more of a "our
representation of local variable lifetimes is wrong" thing.

Interestingly, both gcc and msvc do the same thing.

Quuxplusone commented 4 years ago

(In reply to Ralf Jung from comment #2)
> The alloca have disjoint lifetimes. That means LLVM may actually allocate
> them to the same stack slot.

Correct, I missed that. However, I wonder why/whether it is legal to shorten
the lifetime even though the pointers are still used for the compare call.

I case of Eli's example, I think C++20 justifies it in
http://eel.is/c++draft/basic.stc#4:
> When the end of the duration of a region of storage is reached, the values of
all pointers representing the address of any part of that region of storage
become invalid pointer values. Indirection through an invalid pointer value and
passing an invalid pointer value to a deallocation function have undefined
behavior. Any other use of an invalid pointer value has implementation-defined
behavior.

Note that all compilers optimize the function even without the ptr2int cast:
https://godbolt.org/z/MDp3Ut

Quuxplusone commented 4 years ago

(In reply to Michael Kruse from comment #4)
> (In reply to Ralf Jung from comment #2)
> > The alloca have disjoint lifetimes. That means LLVM may actually allocate
> > them to the same stack slot.
>
> Correct, I missed that. However, I wonder why/whether it is legal to shorten
> the lifetime even though the pointers are still used for the compare call.

If you're talking about C/C++ variables, the lifetime is defined by the
standard.  In my example, the two variables aren't live at the same time, so
the addresses may or may not be numerically equal.  (The problem is just the
contradiction.)  If two variables are live at the same time, the addresses must
not be numerically equal.

If you're talking about the semantics of LLVM, according to the LangRef, two
allocas can't have the same address.  The rules for alloca specify that the
memory is not freed until the function returns. And the rules for
llvm.lifetime.start/end don't say anything about the address, only that
accessing the memory is undefined outside the lifetime.  Therefore, it's
illegal for stack coloring to allocate two variables whose address escapes at
the same address.

Obviously stack coloring does in fact allocate variables at the same address.
And we can't fix it to match LangRef without causing a significant regression
for C++ code.

> I case of Eli's example, I think C++20 justifies it in
> http://eel.is/c++draft/basic.stc#4:

raddr1 and raddr2 are pointers, not integers; the rule in question doesn't
apply.

Quuxplusone commented 4 years ago

(In reply to Eli Friedman from comment #5)=
> raddr1 and raddr2 are pointers, not integers; the rule in question doesn't
> apply.

Somehow got this backwards; meant to say "integers, not pointers".

Quuxplusone commented 4 years ago

> I think the issue you're talking about is something like the following?

Not sure if "you" is me here, but I agree that is a similar example -- z could
observe the two addresses and so we have a contradiction, like in my example.

>  (The problem is just the contradiction.)

Agreed: whether or not the addresses are equal is non-deterministic. That's why
I said my example has two possible executions. It's just, "f(true); f(false);"
is not one of them.

> raddr1 and raddr2 are integers, not pointers; the rule in question doesn't
apply.

The rule in question is a C++ rule, so it applies to Eli's example, but AFAIK
LLVM has no such rule and thus it does not apply to my example. Is that not
right?

> If you're talking about the semantics of LLVM, according to the LangRef, two
allocas can't have the same address.
> The rules for alloca specify that the memory is not freed until the function
returns. And the rules for llvm.lifetime.start/end don't say anything about the
address, only that
> accessing the memory is undefined outside the lifetime.  Therefore, it's
illegal for stack coloring to allocate two variables whose address escapes at
the same address.

Oh, that is surprising. Why would LLVM explicitly say that about alloca and not
about malloc? And what about popping and then pushing the same stack frame, so
that a stack slot ends up at the same physical address -- that would also be
two allocas with the same address (just different in provenance, similar to
malloc reusing memory), why is that not a problem?

I thought the entire point of these lifetime annotations was to let LLVM re-use
the same stack slots for different variables, and this rule seems to make that
unnecessarily hard.

> according to the LangRef, two allocas can't have the same address.
All I could find in the LangRef is
>  Allocating zero bytes is legal, but the returned pointer may not be unique.
But this doesn't say that otherwise, it is unique.

I wonder if it would be possible to adjust lifetime.end semantics to say that
this may (or may not) free the memory for that alloca, to permit reusing
addresses.

Quuxplusone commented 4 years ago

(In reply to Ralf Jung from comment #7)
> Oh, that is surprising. Why would LLVM explicitly say that about alloca and
> not about malloc?

malloc'ed memory is allocated until you call free(), according to the C
standard.  This is a similar thing, except that it's freed implicitly when the
function returns.  To be clear, I only mean allocas which are live at the same
time; obviously later function calls can reuse the same stack.

(For dynamic allocas, there's also stacksave/stackrestore; a stackrestore also
frees memory.  But those are pretty rare in most code.)

> > according to the LangRef, two allocas can't have the same address.
> All I could find in the LangRef is
> >  Allocating zero bytes is legal, but the returned pointer may not be unique.
> But this doesn't say that otherwise, it is unique.

I think it's implied by "allocating" memory; two allocations can't overlap.
This roughly matches the semantics of the languages we're trying to model.

> I wonder if it would be possible to adjust lifetime.end semantics to say
> that this may (or may not) free the memory for that alloca, to permit
> reusing addresses.

Maybe.  It would be pretty inconvenient to reason about, since the uses of the
memory aren't tied to the actual allocation. And it would be difficult to
describe the rules in an intuitive way.  But I can't think of any showstoppers.

Quuxplusone commented 4 years ago

> To be clear, I only mean allocas which are live at the same time; obviously
> later function calls can reuse the same stack.

These are not live at the same time though? I do not recall the exact wording,
but as far as I know C does not require local variables to be live for the
entire function; when they are declared in a sub-scope like in your example,
they can be deallocated earlier.

So given that LLVM encodes these source-level scopes into its IR using
lifetime.start/end, it seems odd to say that allocas are "freed implicitly when
the function returns". It seems more useful, and closer to C, to instead say
they are "freed implicitly when their lifetime ends". This also reflects what
stack coloring is doing.

> I think it's implied by "allocating" memory; two allocations can't overlap.
> This roughly matches the semantics of the languages we're trying to model.

In my mental model, an "alloca" is only really allocated once it becomes live
(lifetime.start), and it gets deallocated when its live range ends. LLVM does
not have "scopes"/"blocks" like C or many other source languages do, so what
should those languages do when they want stack variables to be allocated only
when their scope begins, and deallocated when their scope ends (which I think
is common)? They add lifetime.start/end.

(There is a special case for alloca that don't have lifetime intrinsics
associated with those; those are indeed live until the function returns. This
does create some nasty corner cases, which has been the subject of discussion
in Rust for quite a while, but progress has been hard as it is not very clear
what exactly LLVM's requirements are. See e.g. https://github.com/rust-
lang/rust/issues/42371.)

Quuxplusone commented 4 years ago

(I got a sendmail error from the web frontend, re-triggering notifications.)

Quuxplusone commented 4 years ago

(More email errors.)

Quuxplusone commented 4 years ago

(There have been posts but no notifications.)

Quuxplusone commented 4 years ago

FYI: In Alive2, the semantics of alloca depends on whether it has lifetime
intrisic uses at parsing time. :)
If alloca has any uses with lifetime intrinsics, it is truly allocated at
lifetime.start. Before that, it is assigned a block id, but it does not occupy
the space and accessing it is UB.
If alloca is not used by any lifetime intrinsics, it is immediately allocated
at the definition.
This has worked pretty well so far while testing with LLVM unit tests and
running -O3 from a few agglomerated C single programs.

Quuxplusone commented 4 years ago

For more details which might be related with the github link:
- Having lifetime.start twice on the same alloca does not reallocate the block
- Alive2 does not consider executions that has two live allocas (and mallocs of
course) have addresses overlapped

Quuxplusone commented 4 years ago

FYI: In Alive2, the semantics of alloca depends on whether it has lifetime intrisic uses at parsing time. :)

That is funny, this is exactly what Miri also does.^^

If alloca has any uses with lifetime intrinsics, it is truly allocated at lifetime.start. Before that, it is assigned a block id, but it does not occupy the space and accessing it is UB.

Interesting, Miri does not even reserve a block ID. Each new lifetime.start generates a fresh block ID. This models that we want pointers to be invalidated when the lifetime of a local ends, even if the same local becomes live again later. (Compilation from that to Alive2's semantics seems fine, it just restricts allocator choice.)

Quuxplusone commented 4 years ago

Hmm, interesting. Alive2 actually treats comparsions differently if there are lifetime intrinsics: https://alive2.llvm.org/ce/z/xWcu-2 .

Quuxplusone commented 4 years ago

(In reply to Eli Friedman from comment #16)
> Hmm, interesting. Alive2 actually treats comparsions differently if there
> are lifetime intrinsics: https://alive2.llvm.org/ce/z/xWcu-2 .

In theory it doesn't (modulo bugs, of course).

In this case, since the 2nd alloca is only allocated after the first one is
deallocated, the comparison may return either true or false non-
deterministically (depending on where the 2nd alloca ends up). If you check,
Alive2 is ok with optimizing that function to either return true or false.

If there are no lifetime intrinsics, then it's guaranteed that the two allocas
must be in disjoint memory, hence the pointer comparison can only be folded to
false.

Quuxplusone commented 4 years ago

A Rust user found another case where comparison with icmp is non-deterministic: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=8affd2a65b02a35e16dbab4682cb8886.

What this does is it take pointers to two functions that are marked as "unnamed", casts those pointers to integers, and then compares them twice -- once after passing them through a "black box" noinline function, and once immediately. The compiler seemingly optimized the latter comparison to "false", but later makes the two function's addresses identical.

If we describe the semantics of this program at the LLVM IR level, then I think we have to either accept that "icmp" can produce different results for the same input (it is non-deterministic), or the program has to have UB. But it's not UB to compare "unnamed function" pointers, right? My expectation was that unnamed function can end up having the same or different addresses depending on optimizer decisions etc, but whether it is one or the other has to be consistent throughout a single execution of the program.

Quuxplusone commented 4 years ago

(In reply to Ralf Jung from comment #18)
> A Rust user found another case where comparison with icmp is
> non-deterministic:
> https://play.rust-lang.org/
> ?version=stable&mode=release&edition=2018&gist=8affd2a65b02a35e16dbab4682cb88
> 86.
>
> What this does is it take pointers to two functions that are marked as
> "unnamed", casts those pointers to integers, and then compares them twice --
> once after passing them through a "black box" noinline function, and once
> immediately. The compiler seemingly optimized the latter comparison to
> "false", but later makes the two function's addresses identical.
>
> If we describe the semantics of this program at the LLVM IR level, then I
> think we have to either accept that "icmp" can produce different results for
> the same input (it is non-deterministic), or the program has to have UB.
> But it's not UB to compare "unnamed function" pointers, right?  My
> expectation was that unnamed function can end up having the same or
> different addresses depending on optimizer decisions etc, but whether it is
> one or the other has to be consistent throughout a single execution of the
> program.

You're right, whether two addresses are the same or different has to be chosen
consistently.  This is also why the merging transform on unnamed_addr is "one-
way": you can merge two globals into one global, but you can't split one global
into two.  (I remember discussing this at some point for constants, but I can't
seem to find it.)

This doesn't really come up in C: the only unnamed_addr globals with a user-
accessible address are strings, and we fold identical strings in the frontend.

This is only loosely related to the alloca issue; please file a separate bug.

Quuxplusone commented 4 years ago

All right, opened https://bugs.llvm.org/show_bug.cgi?id=47090.

Quuxplusone commented 4 years ago

For Eli F example:
void z(long*);
int a() {
    long raddr1, raddr2;
    {
       long r; z(&r); raddr1 = (long)&r;
    }
    {
       long r; z(&r); raddr2 = (long)&r;
    }
    return raddr1 == raddr2;
}

At the "return raddr1 == raddr2;"  isn't both raddr1 and raddr2 undefined
because r is out of scope.

So, comparison of two "undefined" is probably undefined also.
So, shouldn't compiling this instead give a compiler warning, instead of an
alias analysis problem?

Quuxplusone commented 4 years ago

@James
raddr1 and raddr2 are both (long) integers, which should have fixed values at
the point of comparison.

I don't think they are poison values, which would mean they are initially non-
poison, but become poison when leaving the scop.

Quuxplusone / LLVMBugzillaTest

incorrect (?) transformation around icmp #45349