Open Quuxplusone opened 12 years ago
Attached tsc_s000.ll
(9931 bytes, application/octet-stream): Test case
Looking into this more, it seems that the problem is that the scheduler never
has more than one instruction from which to choose.
The instruction stream is essentially a series of:
%inc15 = or i32 %i.013, 1
%arrayidx.1 = getelementptr inbounds [16000 x double]* @Y, i32 0, i32 %inc15
%1 = load double* %arrayidx.1, align 8, !tbaa !0
%add.1 = fadd double %1, 1.000000e+00
%arrayidx5.1 = getelementptr inbounds [16000 x double]* @X, i32 0, i32 %inc15
store double %add.1, double* %arrayidx5.1, align 8, !tbaa !0
%inc.116 = or i32 %i.013, 2
%arrayidx.2 = getelementptr inbounds [16000 x double]* @Y, i32 0, i32 %inc.116
%2 = load double* %arrayidx.2, align 16, !tbaa !0
%add.2 = fadd double %2, 1.000000e+00
%arrayidx5.2 = getelementptr inbounds [16000 x double]* @X, i32 0, i32 %inc.116
store double %add.2, double* %arrayidx5.2, align 16, !tbaa !0
And during scheduling we have:
Examining Available:
Examining Available:
Height 39: SU(9): 0x34c4db0: ch = STFD 0x34c49b0, 0x34c6ad0, 0x34a2bf0,
0x34c48b0:1<Mem:ST8[%scevgep18](align=16)(tbaa=!"double")> [ORD=126] [ID=9]
*** Scheduling [39]: SU(9): 0x34c4db0: ch = STFD 0x34c49b0, 0x34c6ad0,
0x34a2bf0, 0x34c48b0:1<Mem:ST8[%scevgep18](align=16)(tbaa=!"double")> [ORD=126]
[ID=9]
...
Examining Available:
Height 42: SU(55): 0x34c49b0: f64 = FADD 0x34c48b0, 0x34bb4f0 [ORD=122] [ID=55]
*** Scheduling [42]: SU(55): 0x34c49b0: f64 = FADD 0x34c48b0, 0x34bb4f0
[ORD=122] [ID=55]
...
Examining Available:
Height 48: SU(10): 0x34c48b0: f64,ch = LFD 0x34c6ad0, 0x34a2cf0,
0x34c46b0<Mem:LD8[%scevgep21](align=16)(tbaa=!"double")> [ORD=121] [ID=10]
*** Scheduling [48]: SU(10): 0x34c48b0: f64,ch = LFD 0x34c6ad0, 0x34a2cf0,
0x34c46b0<Mem:LD8[%scevgep21](align=16)(tbaa=!"double")> [ORD=121] [ID=10]
...
Is this an alias-analysis problem?
Is this caused by the TODO in the following comment in
CodeGen/ScheduleDAGInstrs.cpp (BuildSchedGraph)?
// Add chain dependencies.
// Chain dependencies used to enforce memory order should have
// latency of 0 (except for true dependency of Store followed by
// aliased Load... we estimate that with a single cycle of latency
// assuming the hardware will bypass)
// Note that isStoreToStackSlot and isLoadFromStackSLot are not usable
// after stack slots are lowered to actual addresses.
// TODO: Use an AliasAnalysis and do real alias-analysis queries, and
// produce more precise dependence information.
It seems that the answer to that is no; that piece of code is not executed. The dependencies seem to be added because the SDNodes are like:
SU(6): 0x228add0: ch = STFD 0x228abd0, 0x228a7d0, 0x228d3a0, 0x228a9d0:1<Mem:ST8%arrayidx5(tbaa=!"double")> [ORD=6] [ID=6]
SU(15): 0x228ec70: i32 = LA 0x228ea70, 0x228e970 [ID=15]
SU(5): 0x228b1d0: f64,ch = LFD 0x228d4a0, 0x228ec70, 0x228add0<Mem:LD8%arrayidx.1> [ORD=9] [ID=5]
So the last operand of the load is the result of the store. Because the value type is MVT::Other, ScheduleDAGSDNodes::AddSchedEdges() considers this to be part of the critical chain. Why are the loads and stores (into independent arrays) connected in this way?
(In reply to comment #3)
> So the last operand of the load is the result of the store. Because the value
> type is MVT::Other, ScheduleDAGSDNodes::AddSchedEdges() considers this to be
> part of the critical chain. Why are the loads and stores (into independent
> arrays) connected in this way?
That's a known missed optimization.
I am working on a patch to SelectionDAGBuilder to correct this. I'll submit it for review soon.
tsc_s000.ll
(9931 bytes, application/octet-stream)