Closed Quuxplusone closed 4 years ago
Attached tc_notempty_emptyrange.ll
(628 bytes, text/plain): reduced (very small) test case
The problematic interval is %4. Before coalescing it looks like this: %4 [80r,256r:0) 0@80r weight:0.000000e+00
After coalescing %4 has been promoted from a gpr32 to a gpr128 while merged with the various copy. Yet only the low 32-bit lane is used and the other lanes are empty: %4 [80r,256r:0) 0@80r L0000000000000007 EMPTY L0000000000000008 [80r,256r:0) 0@80r weight:0.000000e+00
The code looks like this after coalescing:
80B undef %4.subreg_l64:gr128bit = LGHI 0
<...>
256B STC %4.subreg_l32:gr128bit, %2:addr64bit, 19, $noreg :: (store 1 into i8* getelementptr inbounds ([7 x [10 x i8]], [7 x [10 x i8]]* @g_222, i64 0, i64 1, i64 9)
)
I would have expected %4 to define dead refs on 80B instead of keeping the lanes empty. So probably an issue with how we create the subranges when we do rematerialization here.
For the record the reason why the option -terminal-rule
is needed is because otherwise %4 gets merged with %3 and the one that gets the dematerialization is %9 which is already a 128-bit value, so we don't create empty lanes when we rematerialize.
So probably an issue with how we create the subranges when we do rematerialization here.
I concur the code that updates the subranges when the definition has a subreg, only cleans up unused and not defined empty lanes but doesn't assign the proper dead def to the one that are unused but defined.
Attaching a tentative patch.
Attached regcoal.patch
(1434 bytes, text/plain): Tentative patch
With the attached patch, %4 looks much more reasonable after coalescing:
%4 [80r,256r:0) 0@80r L0000000000000007 [80r,80d:0) 0@80r L0000000000000008
[80r,256r:0) 0@80r weight:0.000000e+00
Fixed in
https://github.com/llvm/llvm-project.git
eb9ca9da3e9..ccb3c8e8613 master -> master
tc_notempty_emptyrange.ll
(628 bytes, text/plain)regcoal.patch
(1434 bytes, text/plain)