Open jjsjann123 opened 4 months ago
In the example above:
(gdb) print id->fusion()->printMath(0)
Inputs:
T0_g[ bS0{1}, iS1{10} ], float
T1_g[ iS2{4}, iS3{10} ], float
Outputs:
T8_g[ iS18{5}, iS19{10} ], float
%kernel_math {
Resize: bS10{1}rf by 0 and 4 -> iS11{5}rf
i21 = 0 + 1;
i30 = i21 + 4;
i37 = i21 + 4;
Resize: iS13{4}rf by ( 0 + 1 ) and 0 -> iS14{( ( 0 + 1 ) + 4 )}rf
T2_l[ bS4{1}, iS5{10} ]
= relu(T0_g[ bS0{1}, iS1{10} ]);
T3_l[ bS6{1}, iS7{10} ]
= -T0_g[ bS0{1}, iS1{10} ];
T4_l[ bS8{1}, iS9{10} ]
= T2_l[ bS4{1}, iS5{10} ]
+ T3_l[ bS6{1}, iS7{10} ];
T5_l[ iS11{5}rf, iS12{10} ]
= pad( T4_l[ bS8{1}, iS9{10} ], {0, 4, 0, 0} )
T6_l[ iS14{( ( 0 + 1 ) + 4 )}rf, iS15{10} ]
= pad( T1_g[ iS2{4}, iS3{10} ], {i21, 0, 0, 0} )
T7_l[ iS16{5}, iS17{10} ]
= cat( T5_l[ iS11{5}rf, iS12{10} ], T6_l[ iS14{( ( 0 + 1 ) + 4 )}rf, iS15{10} ], 0 )
T8_g[ iS18{5}, iS19{10} ]
= T7_l[ iS16{5}, iS17{10} ]
+ T0_g[ bS0{1}, iS1{10} ];
b50 = blockIdx.x >= 0;
b52 = gridDim.x > 0;
b54 = blockIdx.x < gridDim.x;
b56 = blockIdx.y >= 0;
b58 = gridDim.y > 0;
b60 = blockIdx.y < gridDim.y;
b62 = blockIdx.z >= 0;
b64 = gridDim.z > 0;
b66 = blockIdx.z < gridDim.z;
b68 = threadIdx.x >= 0;
b70 = blockDim.x > 0;
b72 = threadIdx.x < blockDim.x;
b74 = threadIdx.y >= 0;
b76 = blockDim.y > 0;
b78 = threadIdx.y < blockDim.y;
b80 = threadIdx.z >= 0;
b82 = blockDim.z > 0;
b84 = threadIdx.z < blockDim.z;
s86 = getMetaData(T0_g[ bS0{1}, iS1{10} ])
s87 = getMetaData(T1_g[ iS2{4}, iS3{10} ])
}
No potential concrete_id's found for disjoint set { bS0{1}; bS4{1}; bS6{1}; bS8{1}; bS10{1}rf; iS14{( ( 0 + 1 ) + 4 )}rf; iS11{5}rf; iS16{5}; iS18{5} }")
Looks like after the last swap we ended up with an empty maybe_concrete_ids
somehow.
This one still repros. I thought we can use NVFUSER_ENABLE=id_model
to avoid compute_at but that doesn't seem to help here. cc'ing @naoyam
Repro c++ script.
hits an assert vvv
Full trace: