Open Quuxplusone opened 8 years ago
Bugzilla Link | PR30556 |
Status | NEW |
Importance | P normal |
Reported by | Carrot (carrot@google.com) |
Reported on | 2016-09-28 17:07:41 -0700 |
Last modified on | 2017-03-29 23:43:28 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | echristo@gmail.com, ehsanamiri@gmail.com, hfinkel@anl.gov, kit.barton@gmail.com, llvm-bugs@lists.llvm.org, nemanja.i.ibm@gmail.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
This is the relevant part of IR before lowering (I reduced the test case a
little more, so there might be irrelevant differences)
invoke.cont3:
store i64 %3, i64* %f1.i49, align 8, !tbaa !13
store i32 %conv.i, i32* %f2.i, align 8, !tbaa !16
store i32 %conv3.i, i32* %f3.i, align 4, !tbaa !17
%cmp.i.i = icmp eq %struct.O* %21, %22
br i1 %cmp.i.i, label %if.else.i.i, label %if.then.i.i
if.then.i.i: ; preds = %invoke.cont3
%23 = bitcast %struct.O* %o.i to i8*
%24 = bitcast %struct.O* %21 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %24, i8* nonnull %23, i64 16, i32 8, i1 false) #6, !tbaa.struct !22
if.else.i.i: ; preds = %invoke.cont3
%f4.i51 = bitcast %class.B* %b to %"class.std::vector.5"*
invoke void @_ZNSt6vectorI1OSaIS0_EE19_M_emplace_back_auxIJRKS0_EEEvDpOT_(%"class.std::vector.5"* nonnull %f4.i51, %struct.O* nonnull dereferenceable(16) %o.i)
to label %invoke.cont5 unwind label %lpad4
So if the vector is full, we go to the else branch where we pass a pointer to
o.i to _M_emplace_back_aux. In this case, storing to memory is needed.
We need to
(1) recognize that the input to memcpy is stored in the predecessor block, and
it is avaialble in SSA variables, so memcpy can be replaced by store to avoid
extra load.
(2) Once we do this, we need a forward store motion optimization, to move
stores to the else branch.
Memcpy is from a function void __gnu_cxx::new_allocator<O>::construct<O, O
const&>(O*, O const&), and ends up in function bar after multiple (I think 4)
levels of inlining
Given that memcpy is not generated by the compiler, The first question is
which pass and when should handle (1).
Carrot,
The reduced test case that you have provided, is a very good use case for vector::emplace_back. Is this an option for you to change the source code, and use emplace_back instead of push_back?
I have not checked the PPC code generated for emplace_back, but given the functionality, I expect it to solve exactly this problem. I thought to check with you if source code change is an option or not. If it is, I can double check to ensure we generate good code for emplace_back.
(Also I don't think the problem here is limited to PPC. I believe if we want to fix this, it should be done entirely in the mid-end. Maybe GVN? I hope we can avoid fixing the problem in the compiler though)
from a quick look at the assembly generated, It seems that using emplace_back,
fixes the problem (1) that I mentioned in comment (1). The problem (2) is
probably still there. ( see my comments marked with *****)
Code with emplace_back:
.LBB0_6: # %invoke.cont3
ld 3, 128(31)
ld 4, 120(31)
ld 6, 112(31)
std 28, 184(31)
sub 4, 3, 4
ld 3, 104(31)
rldicl 5, 4, 61, 3
sradi 4, 29, 3
stw 5, 180(31)
std 4, 168(31)
cmpld 3, 6
beq 0, .LBB0_8
# BB#7: # %if.then.i.i
addi 29, 3, 16
stw 5, 8(3)
std 28, 0(3)
stw 4, 12(3)
std 29, 104(31)
b .LBB0_9
.LBB0_8: # %if.else.i.i
.Ltmp5:
addi 3, 31, 96
addi 4, 31, 184
addi 5, 31, 180
addi 6, 31, 168
bl _ZNSt6vectorI1OSaIS0_EE19_M_emplace_back_auxIJRlRKimEEEvDpOT_
nop
.LBB0_6: // This block still contains all the stores *****
ld 3, 128(31)
ld 4, 120(31)
ld 5, 112(31)
rldicl 6, 29, 61, 3
std 28, 176(31)
stw 6, 188(31)
sub 4, 3, 4
ld 3, 104(31)
rldicl 4, 4, 61, 3
stw 4, 184(31)
cmpld 3, 5
beq 0, .LBB0_8
# BB#7: # %if.then.i.i
addi 4, 31, 176
lxvd2x 0, 0, 4 // This load is gone. *****
stxvd2x 0, 0, 3
ori 2, 2, 0
ld 3, 104(31) // This load is gone *****
addi 3, 3, 16
std 3, 104(31)
b .LBB0_9
.LBB0_8: # %if.else.i.i
.Ltmp5:
addi 3, 31, 96
addi 4, 31, 176
bl _ZNSt6vectorI1OSaIS0_EE19_M_emplace_back_auxIJRKS0_EEEvDpOT_
nop
.Ltmp6:
In comment 3 I forgot to indicate where the code with push_back starts. Look at the basic block label .LBB0_6: which is the starting point of each code segment.
(In reply to comment #2)
> Carrot,
>
> The reduced test case that you have provided, is a very good use case for
> vector::emplace_back. Is this an option for you to change the source code,
> and use emplace_back instead of push_back?
>
> I have not checked the PPC code generated for emplace_back, but given the
> functionality, I expect it to solve exactly this problem. I thought to check
> with you if source code change is an option or not. If it is, I can double
> check to ensure we generate good code for emplace_back.
>
> (Also I don't think the problem here is limited to PPC. I believe if we want
> to fix this, it should be done entirely in the mid-end. Maybe GVN? I hope we
> can avoid fixing the problem in the compiler though)
I agree with you this is a mid-end problem. But unfortunately I can't modify
the source code.
Definitely different code generation now, haven't looked at the performance implications though.