Closed Quuxplusone closed 14 years ago
Bugzilla Link | PR1075 |
Status | RESOLVED FIXED |
Importance | P normal |
Reported by | Dan Gohman (llvm@sunfishcode.online) |
Reported on | 2007-01-04 10:35:07 -0800 |
Last modified on | 2010-02-22 12:44:26 -0800 |
Version | trunk |
Hardware | PC Linux |
CC | clattner@nondot.org, llvm-bugs@lists.llvm.org |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
I am going to take a closer look at his. But I don't see the problem you are
describing with the given example:
*** Final schedule ***
SU(0): 0x7b048a0: ch = EntryToken
SU(11): 0x7b055a0: f32,ch = MOVSSrm 0x7b04eb0, 0x7b05c20, 0x7b04e50, 0x7b053e0,
0x7b048a0
SU(9): 0x7b04fe0: f32,ch = MULSSrm 0x7b055a0, 0x7b04e50, 0x7b05c20, 0x7b04e50,
0x7b057c0, 0x7b048a0
SU(10): 0x7b04e00: f32,ch = MULSSrm 0x7b055a0, 0x7b04e50, 0x7b05c20, 0x7b04e50,
0x7b05540, 0x7b048a0
SU(1): 0x7b05090: f32 = ADDSSrr 0x7b04e00, 0x7b04fe0
SU(8): 0x7b05de0: f32,ch = MULSSrm 0x7b055a0, 0x7b04e50, 0x7b05c20, 0x7b04e50,
0x7b05a80, 0x7b048a0
SU(2): 0x7b05110: f32 = ADDSSrr 0x7b05090, 0x7b05de0
SU(7): 0x7b06100: f32,ch = MULSSrm 0x7b055a0, 0x7b04e50, 0x7b05c20, 0x7b04e50,
0x7b05d80, 0x7b048a0
SU(3): 0x7b05190: f32 = ADDSSrr 0x7b05110, 0x7b06100
SU(6): 0x7b05440: ch = MOVSSmr 0x7b04f50, 0x7b05c20, 0x7b04e50, 0x7b053e0,
0x7b05190, 0x7b048a0
SU(5): 0x7b06240: f64,ch = FpLD32m 0x7b04f50, 0x7b05c20, 0x7b04e50, 0x7b053e0,
0x7b05440
SU(4): 0x7b04ba0: ch = RET 0x7b05200, 0x7b05200:1
0x7b05200: ch,flag = FpSETRESULT 0x7b06240, 0x7b06240:1
foo:
subl $4, %esp
movss 8(%esp), %xmm0
movaps %xmm0, %xmm1
mulss .LCPI1_0, %xmm1
movaps %xmm0, %xmm2
mulss .LCPI1_1, %xmm2
addss %xmm1, %xmm2
movaps %xmm0, %xmm1
mulss .LCPI1_2, %xmm1
addss %xmm1, %xmm2
mulss .LCPI1_3, %xmm0
addss %xmm0, %xmm2
movss %xmm2, (%esp)
flds (%esp)
addl $4, %esp
ret
Multiplies are intermixed with the adds. Dan, are you working with tot?
Hmmm... I see there are some issues. Hopefully I can fix the general cases
without breaking the special ones.
I used -march=x86-64 above; the code you posted looks like 32-bit code. Offhand
I don't know why it schedules differently on x86 and x86-64; maybe the special
cases are applying differently.
Here's a case that shows the problem on both x86 and x86-64. In this test, the
div instructions should precede the mul instructions, according to pure
Sethi-Ullman numbering.
float %foo(float* %a) {
entry:
%tmp = load float* %a
%tmp4 = getelementptr float* %a, int 1
%tmp5 = load float* %tmp4
%tmp6 = mul float %tmp, %tmp5
%tmp8 = getelementptr float* %a, int 2
%tmp9 = load float* %tmp8
%tmp11 = getelementptr float* %a, int 3
%tmp12 = load float* %tmp11
%tmp13 = mul float %tmp9, %tmp12
%tmp14 = mul float %tmp6, %tmp13
%tmp16 = getelementptr float* %a, int 4
%tmp17 = load float* %tmp16
%tmp19 = getelementptr float* %a, int 5
%tmp20 = load float* %tmp19
%tmp21 = fdiv float %tmp17, %tmp20
%tmp23 = getelementptr float* %a, int 6
%tmp24 = load float* %tmp23
%tmp26 = getelementptr float* %a, int 7
%tmp27 = load float* %tmp26
%tmp28 = fdiv float %tmp24, %tmp27
%tmp29 = fdiv float %tmp21, %tmp28
%tmp31 = getelementptr float* %a, int 8
%tmp32 = load float* %tmp31
%tmp34 = getelementptr float* %a, int 9
%tmp35 = load float* %tmp34
%tmp36 = fdiv float %tmp32, %tmp35
%tmp38 = getelementptr float* %a, int 10
%tmp39 = load float* %tmp38
%tmp41 = getelementptr float* %a, int 11
%tmp42 = load float* %tmp41
%tmp43 = fdiv float %tmp39, %tmp42
%tmp44 = fdiv float %tmp36, %tmp43
%tmp45 = fdiv float %tmp29, %tmp44
%tmp46 = add float %tmp14, %tmp45
ret float %tmp46
}
Yeah, x86 and x86-64 cases scheduled differently because of calling convention
differences (x86 case has an extra store).
I have fixed this. Doing some tests now.
Fixed.
Thanks Evan, please add a test for this. :)
ah, you did, it's Regression/CodeGen/X86/2007-01-08-InstrSched.ll
Thanks :)