Open hecmay opened 4 years ago
I think the problem here is the axis. According to the error message, it seems like the first reuse_at
is incorrect. There is no reuse across the 0th dimension (i.e., the batches) and this makes sense.
That makes sense. Actually the reuse_at
won't error out with placeholder input (ie. first primitive) even if it is required ti find reuse pattern at batch level. The error message comes from the second resue_at
primitive when the input is a tensor. They both work at the height or width level.
I suggest we leave this issue open so we know we are missing support for non-unit stride stencil. We should also document all the existing limitations for each customization primitives.
Also, do we report error message for reuse_at() where there is no reuse opportunities?
My previous answer was wrong so I deleted it. This issue is exactly caused by no reuse opportunities instead of non-unit stride. For the limitation, it has been already documented in our online documentation. You can see it here.
Good to know. But is the compiler spitting proper error when reuse_at does not apply? Also, is there a fundamental challenge that prevents us from supporting non-unit stride?
For the first question, as you can see from the error message in the first post, it clearly specifies that axis nn
does not have reuse opportunities. For other types of limitation, the compiler will spit out different messages. For the second question, the answer is no. We just need more engineering effort.
Also, another limitation. reuse_at
does not take effect when combined with compute_at
primitive. For example, with the following snippet.
s[conv2].compute_at(s[tanh2], tanh2.axis[3]) # combine CONV with tanh
s.reuse_at(pool1._op, s[conv2], conv2.axis[2]) # linebuffer at index y
For here I want to combine the conv2d stage conv2
into activation stage tanh2
with compute_at
, and then reuse the max-pooled input from last stage (i.e., pool1
max-pooled from conv1
). The IR is not as expected (reuse buffer was allocated but not implemented, and no error message thrown out):
// attr [pool1.reuse] storage_scope = "global"
allocate pool1.reuse[int32 * 1]
// attr [tanh2] storage_scope = "global"
allocate tanh2[int32 * 1000 * 50 * 8 * 8]
produce tanh2 {
// attr [0] extern_scope = 0
for "app_name"="tanh" (args, 0, 1000) {
for (args0, 0, 50) {
for (args1, 0, 8) {
for (args2, 0, 8) {
// attr [conv2] storage_scope = "global"
allocate conv2[int32 * 1 * 1 * 1 * 1]
produce conv2 {
// attr [0] extern_scope = 0
// attr [reducer2] storage_scope = "global"
allocate reducer2[float32 * 1]
produce reducer2 {
// attr [0] extern_scope = 0
reducer2[0] = 0.000000f
}
for (ra5, 0, 20) {
for (ra6, 0, 5) {
for (ra7, 0, 5) {
reducer2[0] = (float32((int48(pool1[((((args2 + ra7) + ((args1 + ra6)*12)) + (ra5*144)) + (args*2880))])*fixed48_14(weight_conv2[(((ra7 + (ra6*5)) + (ra5*25)) + (args0*500))]))) + reducer2[0])
}
}
}
conv2[0] = int32(reducer2[0])
}
tanh2[(((args2 + (args1*8)) + (args0*64)) + (args*3200))] = int32(tanh(float64(conv2[0])))
}
}
}
}
}
If I use reuse_at
and than compute_at
. The program will crash with SegFault.
I am trying to reuse the input image in a conv2d layer in the LeNet example. The
reuse_at
primitive works fine with the placeholder inputs (i.e.input_image
in the first conv2d). However when passing the max-pooled result to the second conv2d layer, no reuse pattern was found for it.The error message as followed: