Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[Statepoint-VReg] Cornercase bug with multiple uses of a single value #45886

Closed Quuxplusone closed 3 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR46917
Status RESOLVED FIXED
Importance P enhancement
Reported by listmail@philipreames.com
Reported on 2020-07-30 12:03:15 -0700
Last modified on 2020-10-14 07:45:34 -0700
Version trunk
Hardware PC Linux
CC dantrushin@gmail.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
There is a hard to handle cornercase in our experimental vreg statepoint
lowering.  I had briefly described the problem in
https://reviews.llvm.org/D81648?vs=on&id=277132#2146051 as such:
"Second, there appears to be a semantic problem around the handling of base vs
derived slots unless we *always* spill the base. We can't tie both uses to a
single def. This may warrant some offline discussion."

(I decided to land that patch with the issue unresolved to unblock progress
since review discussion was fragmented and hard to follow.)

Let me expand a bit on the issue.  Say we have an example like the following:
target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

declare void @foo()
declare void @consume(i8 addrspace(1)*)

define void @test(i8 addrspace(1)* %a) gc "statepoint-example" {
entry:
  %a_gep = getelementptr i8, i8 addrspace(1)* %a, i64 8
  %safepoint_token = tail call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0) ["gc-live" (i8 addrspace(1)* %a, i8 addrspace(1)* %a_gep)]
  %a1 = call i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %safepoint_token, i32 0, i32 1)
  call void @consume(i8 addrspace(1)* %a1)
  ret void
}

declare token @llvm.experimental.gc.statepoint.p0f_i1f(i64, i32, i1 ()*, i32,
i32, ...)
declare token @llvm.experimental.gc.statepoint.p0f_isVoidf(i64, i32, void ()*,
i32, i32, ...)
declare i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token, i32, i32)

In this example, we generate a statepoint MI node which looks like this:
%6:gr64 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, 1, 8, %stack.0, 0, killed
%7(tied-def 0), csr_64, implicit-def $rsp, implicit-def $ssp :: (volatile load
store 8 on %stack.0)

(Where %7 is the GEP, and %stack.0 is the spill for the base)

Imagine we had instead generated:
%6:gr64 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, 1, 0, %7, 0, killed
%7(tied-def 0), csr_64, implicit-def $rsp, implicit-def $ssp :: (volatile load
store 8 on %stack.0)
(That is, replaced stack with another usage of %7.)

This would be incorrect.  Why?  Because the second use of %7 can not be tied to
the %6 def.  As a result, the fact that the GC may need to update the base -
remember objects may move many times during a call - is lost.  The register
allocator is free to assign different locations to the the two uses of %0, and
then *assume one of them is read only*.  That would be a miscompile.

To say all that different, we tie *operands*, not registers.

Unfortunately, the very slightly tweaked example produces exactly this effect:
define void @test(i8 addrspace(1)* %a) gc "statepoint-example" {
entry:
  %a_gep = getelementptr i8, i8 addrspace(1)* %a, i64 8
  %safepoint_token = tail call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0) ["gc-live" (i8 addrspace(1)* %a, i8 addrspace(1)* %a_gep)]
  %a1 = call i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %safepoint_token, i32 0, i32 0)
  call void @consume(i8 addrspace(1)* %a1)
  ret void
}

This produces:
%1:gr64 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, %0, %0(tied-def 0),
csr_64, implicit-def $rsp, implicit-def $ssp

Which is WRONG.

One possible correct alternative would be:
%1:gr64, %2:gr64 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, %0(tied-def 0),
%0(tied-def 1), csr_64, implicit-def $rsp, implicit-def $ssp

Note that while I've given the example in terms of a single base/derived pair,
you can also construct the same problematic pattern by relocating a single
object twice (or otherwise having multiple gc operand uses of the same value).

The two major families of fixes I see are:
1) Produce a separate tied def for each use copy.
2) Produce at most one use of each value - thus, eagerly spilling further uses.

The second seems like a hard invariant to preserve during optimization, so I'd
tend towards the first.

The first does result in slightly poor codegen since we can end up spilling the
same value twice.  I don't have a simple fix for that.  We could potentially
change the statepoint MI format to remove the dual use, but that's a bit
involved.
Quuxplusone commented 4 years ago
I've been assuming the following holds:

1. statepoint kills all OOPs; all pointers live across statepoint must be
relocated
2. gc.relocate relocate single pointer; in pair (base, derived) it relocates
second (derived) one.

From that I conclude that if base pointer is live across statepoint, it MUST
appear as (base, base) in gc args list.

I also think that all above means that from code generator point of view
(correctness),
in every single (base, derived) pair, base pointer is irrelevant to codegen.
It only is needed for (easier) generation of stack maps later.

All above means IMHO that

%1:gr64 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, %0, %0(tied-def 0),
                     csr_64, implicit-def $rsp, implicit-def $ssp

is valid construct as long as tied operands are assigned to the same register
and that register is used past statepoint for %0. Even though I've never seen
this in practice,
I think it is OK for regalloc to assign untied %0 to something else (another
register or spill
 slot) and possibly treat it read only, as you say.

Note that after SSA 'flattening' this will become

%0:gr64 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, %0, %0(tied-def 0),
                     csr_64, implicit-def $rsp, implicit-def $ssp

I don't think regalloc is allowed to perform this assignment:

$r12 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, $r12, $r13(tied-def 0),
                  csr_64, implicit-def $rsp, implicit-def $ssp

and this would be OK:

$r12 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, $r13, $r12(tied-def 0),
                  csr_64, implicit-def $rsp, implicit-def $ssp

$r13 will (must) be dead after statepoint.
In practice, both %0 arguments are allocated to the same register.

As for

%1:gr64, %2:gr64 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, %0(tied-def 0),
                 %0(tied-def 1), csr_64, implicit-def $rsp, implicit-def $ssp

how would it look like after regalloc?
I don't think the following is possible:

$r12, $r12 = STATEPOINT 0, 0, 0, @foo, 2, 0, 2, 0, 2, 0, $r12(tied-def 0),
                 $r12(tied-def 1), csr_64, implicit-def $rsp, implicit-def $ssp
Quuxplusone commented 4 years ago
(In reply to Denis Antrushin from comment #1)
> I also think that all above means that from code generator point of view
> (correctness),
> in every single (base, derived) pair, base pointer is irrelevant to codegen.
> It only is needed for (easier) generation of stack maps later.
This is the key mistake.  This statement is false.

The base pointer must be available to the GC *during the call*.  Think about a
dynamic trace which looks something like this:

call foo w/stackmap {reg, stackslot} // i.e. single base/derived
  safepoint
  safepoint
return from foo

At the safepoint, the GC will need to have available and update the base
pointer value.  Without it, it can't correctly relocate the derived pointer.
Quuxplusone commented 3 years ago
D87154 fixes this problem by listing every pointer operand only once.
base/derived relation is enconded by following integer arguments (indices
into that pointer list)