Closed kk2049 closed 2 months ago
@comaniac Sorry to bother you. (I really appreciate your help about te.gradient
months ago #8991 ) I wonder if I can get your help again about this problem. I am confusing about this bug and have no idea how to fix it. Thanks a lot!!
It looks like auto-scheduler has issues when generating the schedule sketch for this workload. You could first try to build and run this workload on CPU without tuning to see if we could identify the problem. If that doesn't work, then it must be something wrong with the workload or te.gradient. Otherwise, we could investigate the compute DAG to see why auto-scheduler failed to work on this workload generated by te.gradient.
@comaniac Thanks for your reply! I have tried to run this workload with tvm.target.Target("llvm")
. This workload can be successfully launched. So I select target("cuda")
again and tried to print the compute DAG. It looks like this:
Computational DAG:
kernel = PLACEHOLDER [512, 512, 3, 3]
G(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 3) == 2)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 3) == 1)), ..(OMITTED).. (floormod(i, 4) == 0) && (floormod(j, 3) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 3) == 0)), 1f, 0f))))))))))))
my_kernel_pack(eps, nu, ci, co) += ((kernel[co, ci, r_kh, r_kw]*G[eps, r_kh])*G[nu, r_kw])
data = PLACEHOLDER [1, 512, 7, 7]
data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i2 >= 1) && (i2 < 8)) && (i3 >= 1)) && (i3 < 8)), data[i0, i1, (i2 - 1), (i3 - 1)], 0f)
my_d(c, p, eps_1, nu_1) = data_pad[floordiv(p, 16), c, ((floormod(floordiv(p, 4), 4)*2) + eps_1), ((floormod(p, 4)*2) + nu_1)]
B(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 4) == 2)), ..(OMITTED).. ormod(i, 4) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))
my_data_pack(eps, nu, ci, p) += ((my_d[ci, p, r_a, r_b]*B[r_a, eps])*B[r_b, nu])
my_bgemm(eps, nu, co, p) += (my_kernel_pack[eps, nu, ci, co]*my_data_pack[eps, nu, ci, p])
A(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 2) == 1)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 2) == 0)), ..(OMITTED).. ct(((floormod(i, 4) == 0) && (floormod(j, 2) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 2) == 0)), 1f, 0f))))))))
my_inverse(co, p, vh, vw) += ((my_bgemm[r_a_2, r_b_2, co, p]*A[r_a_2, vh])*A[r_b_2, vw])
my_output(n, co, h, w) = my_inverse[co, ((((n*4)*4) + (floordiv(h, 2)*4)) + floordiv(w, 2)), floormod(h, 2), floormod(w, 2)]
input2_dy = PLACEHOLDER [1, 512, 7, 7]
my_output.my_inverse.grad(ax0, ax1, ax2, ax3) = select((((((((ax2*4) + (floordiv((7 + (ax1*-2)), 8)*-8)) <= 24) && (((ax1*-2) + ..(OMITTED).. ) <= 15)), input2_dy[0, ax0, (ax2 + (floordiv((7 + (ax1*-2)), 8)*-2)), (((floordiv((7 + (ax1*-2)), 8)*8) + (ax1*2)) + ax3)], 0f)
extracted_tensor(n0_n0_vh.shifted.shifted, n1_n1_vw.shifted.shifted, n2_n2_jac_i0.shifted.shifted, n3_n3_jac_i1.shifted.shifted) = (A[n2_n2_jac_i0.shifted.shifted, n0_n0_vh.shifted.shifted]*A[n3_n3_jac_i1.shifted.shifted, n1_n1_vw.shifted.shifted])
my_inverse.my_bgemm.grad(ax0, ax1, ax2, ax3) += (my_output.my_inverse.grad[ax2, ax3, n0_n0_k2.shifted.shifted, n1_n1_k3.shifted.shifted]*extracted_tensor[n0_n0_k2.shifted.shifted, n1_n1_k3.shifted.shifted, ax0, ax1])
my_bgemm.my_data_pack.grad(ax0, ax1, ax2, ax3) += (my_inverse.my_bgemm.grad[ax0, ax1, n0_n0_k2.shifted.shifted, ax3]*my_kernel_pack[ax0, ax1, ax2, n0_n0_k2.shifted.shifted])
extracted_tensor(n0_n0_eps.shifted.shifted, n1_n1_nu.shifted.shifted, n4_n4_jac_i2.shifted.shifted, n5_n5_jac_i3.shifted.shifted) = (B[n4_n4_jac_i2.shifted.shifted, n0_n0_eps.shifted.shifted]*B[n5_n5_jac_i3.shifted.shifted, n1_n1_nu.shifted.shifted])
my_data_pack.my_d.grad(ax0, ax1, ax2, ax3) += (my_bgemm.my_data_pack.grad[n0_n0_k0.shifted.shifted, n1_n1_k1.shifted.shifted, ax0, ax1]*extracted_tensor[n0_n0_k0.shifted.shifted, n1_n1_k1.shifted.shifted, ax2, ax3])
data_pad.data.grad(ax0, ax1, ax2, ax3) += my_data_pack.my_d.grad[ax1, (((((floordiv((ax2 + 1), 2) + n0_n0_fdiv1.shifted.shifted) ..(OMITTED).. ormod((ax2 + 1), 2) + (n0_n0_fdiv1.shifted.shifted*-2)) + 2), ((floormod((ax3 + 1), 2) + (n1_n1_fmod1.shifted.shifted*-2)) + 2)]
I have tried to check this DAG info myself but failed to anything useful. Maybe you can find something in it?
Thanks a lot for your help!!!
My problem
I am trying to use autoscheduler to generate CUDA source code for backward stage for NCHW winograd_conv2d. Due to some bugs in topi.cuda.conv2d_winograd.winograd_cuda, I copied some code to build my workload.
Luckily, this workload works without te.gradient and can successfully get source code for the forward stage. But when I add te.gradient, this workload no longer works and I get an error msg below:
Check failed: (!repl_op.same_as(s->op)) is false: Cannot find Tensor(shape=[4, 2], op.name=A) in the inputs of compute(extracted_tensor.d.shared, ......
I am really confued now. Forward stage codegen can work proves that my workload is correct in some way. So I think this bug may caused by a bug in TVM, but I am not sure.
Maybe someone can help me find out whether it is a bug about TVM.
Thanks a lot!!!
Expected behavior
This code should find a valid schedule
Actual behavior
I get a error below when I start tunning.
Environment
My system is Ubuntun16.04 CUDA version is 10.2 My tvm version is 0.8.0. I build it with the source code from Download Apache TVM Source Code web page.
Steps to reproduce
I am sorry about put such a long code here to make sure this bug can be reproduced. I have tried to cut out some part of my code to reproduce this error, but this bug can only be triggered by this long code.