Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Sibling loops confuse spill placement #23742

Open Quuxplusone opened 9 years ago

Quuxplusone commented 9 years ago
Bugzilla Link PR23743
Status NEW
Importance P normal
Reported by James Molloy (james@jamesmolloy.co.uk)
Reported on 2015-06-03 06:51:33 -0700
Last modified on 2017-06-04 07:20:37 -0700
Version trunk
Hardware PC All
CC arnaud.degrandmaison@arm.com, james@jamesmolloy.co.uk, llvm-bugs@lists.llvm.org, marina.yatsina@intel.com, matze@braunis.de, quentin.colombet@gmail.com, sanjoy@playingwithpointers.com
Fixed by commit(s)
Attachments test.ll (6299 bytes, application/octet-stream)
Blocks
Blocked by
See also
Created attachment 14426
Testcase

In the attached testcase there is one loop with two child loops. The two child
loops are siblings to each other.

A reloaded is inserted in the second sibling loop, instead of a different
register being spilled outside the second loop.

The register being spilled is %vreg7:

selectOrSplit rGPR:%vreg45 [576r,656B:0)[656B,976r:1)[976r,1104B:2)  0@576r
1@656B-phi 2@976r w=2.185477e+05
%R10 is available at cost 1
Only trying the first 10 regs.
should evict: %vreg23 [160r,1184B:0)  0@160r w= 4.682460e+04
should evict: %vreg1 [80r,1184B:0)  0@80r w= 2.164171e+04
should evict: %vreg8 [560r,1104B:0)  0@560r w= 2.003648e+04
should evict: %vreg7 [496r,1104B:0)  0@496r w= 1.771947e+04
evicting %R6 interference: Cascade 4
unassigning %vreg7 from %R6: R6
assigning %vreg45 to %R6: R6 [576r,656B:0)[656B,976r:1)[976r,1104B:2)  0@576r
1@656B-phi 2@976r
queuing new interval: %vreg7 [496r,1104B:0)  0@496r

%vreg7 has the lowest spill cost because it is within an if/else inside the
second loop. However, spilling %vreg23 or %vreg1 would not be as costly as
%vreg7 because both of those registers are live-through the second loop (they
are only used in the first loop).

It appears the spill placement algorithm is aware of loop nest depth (implied
by block frequency) but unaware of loop hierarchy.

Compile with: llc -O3 test.ll and note:

.LBB0_5:                                @ %bb21
                                        @   Parent Loop BB0_2 Depth=1
                                        @ =>  This Inner Loop Header: Depth=2
    sub.w   r8, r7, #2
    cmp r8, r4
    bhi .LBB0_7
@ BB#6:                                 @ %bb27
                                        @   in Loop: Header=BB0_5 Depth=2
    ldrb.w  r5, [r9, r7]
    lsl.w   r2, lr, r11
    eors    r2, r5
    ldr r5, [sp, #8]            @ 4-byte Reload    ***NAUGHTY***
    and.w   lr, r2, r5
Quuxplusone commented 9 years ago

Attached test.ll (6299 bytes, application/octet-stream): Testcase

Quuxplusone commented 9 years ago
Hi James,

I am not sure what you are suggesting here.

When we take the decision of evicting something for vreg45, we have to assume
the worst case scenario, i.e., the evicted register will be spilled around all
uses. In that respect, both vreg23 and verg1 are more expensive I guess (I
haven’t actually checked). The fact that evicting one of those is cheaper may
come from the fact that unlike vreg7, we will be able to split them and have a
cheaper live-range to spill as a result.
Anyhow, we do not know that when taking the decision to evict and I do not see
how we can determine that at that time.

The bottom line is I do not see how the loop hierarchy would help here.

Thanks.
Quuxplusone commented 9 years ago

Hi Quentin,

Indeed, the live range of vreg23/vreg1 should be split, I think. How do you think we should approach this? the spill in the inner loop is avoidable (with perfect allocation) and is obviously bad, so is there anything we can do about it?

I'm not massively well versed in the myriad tradeoffs in register allocation/range-splitting/spill-placement/coalescing.

James

Quuxplusone commented 9 years ago
I have to double check but my hopes are low that we can do something
reasonable. Basically we are hitting a fundamental limitation of the algorithm.
It looks at the current state for deciding and cannot think future (which is
pretty hard).

I believe we could tweak the eviction policy to “fake” the splitting mechanism
and check how expensive that would be to spill the resulting live-range (which
should be shorter, thus likely less expensive).
However, assuming this is doable (faking the splitting may be fun), like all
heuristics, if we get it wrong, e.g., because the split live-ranges would have
different priority and we completely missed what the future will in fact look
like, we can have some dramatic regressions.