Closed hecmay closed 3 years ago
@Hecmay What is read_channel_intel? Note that we should not have target-specific primitive in our IR. This is supposed to be handled by different back-ends.
I think he is showing the generated Intel OpenCL code, not the IR. For IR, we only have channel_read and channel_write. The final implementation is back-end dependent.
read_channel_intel
is the API function provided in Intel FPGA SDK for OpenCL to read data from a streaming channel. It corresponds to StreamStmt
or StreamExpr
IR node in HeteroCL IR. The IR representation is similar to the generated code.
def grad_weight_y(handle64(grad_weight_y.pack[436*1024]), handle64(grad_weight_y.y_filt[436*1024])) {
// attr [c_buf_5] storage_scope = "local"
allocate c_buf_5[int32 * 436 * 1024]
// attr [c_buf_4] storage_scope = "local"
allocate c_buf_4[int32 * 436 * 1024]
// attr [g_f] storage_scope = "global"
allocate g_f[float32 * 7]
produce g_f {
// attr [0] extern_scope = 0
g_f[0] = 0.075500f
g_f[1] = 0.133000f
g_f[2] = 0.186900f
g_f[3] = 0.290300f
g_f[4] = 0.186900f
g_f[5] = 0.133000f
g_f[6] = 0.075500f
}
for (y, 0, 430) {
for (x, 0, 1024) {
allocate reducer2[float32 * 1]
reducer2[0] = 0.000000f
for (rdx, 0, 7) {
reducer2[0] = ((c_buf_4.read()*g_f[rdx]) + reducer2[0])
}
grad_weight_y.y_filt[((x + (y*1024)) + 3072)] = reducer2[0]
}
}
pipelined (buf_1, 0, 1024) {
for (buf_0, 0, 436) {
c_buf_5.write(grad_weight_y.y_filt[(buf_1 + (buf_0*1024))])
}
}
}
I see. I was confused by "StreamExpr (i.e., read_channel_intel)". These two are not supposed to be listed in parallel then.
The issue is closed since it does not exist anymore. In the current implementation, s.to() only inserts some annotations into the IR instead of mutating the IR directly. The lowering function will replace the annotated Load IR nodes with StreamExpr nodes after the reuse buffer is inserted into the IR. In other words, the reuse pattern detection function only needs to handle Load IR nodes, and later passes will replace annotated Load IR nodes with StreamExpr nodes.
For now
reuse_at
can only handle the reuse pattern found in Load expression. However, the Streaming IR pass replaces some potentially reusable Load expressions with StreamExpr expressions, making thegenerate_reuse_buffer
IR pass crash with SegFault.Example of generated Intel OpenCL code with streaming channel as followed. the Input image is replaced with
StreamExpr
in this case(i.e.,read_channel_intel(c_buf_4)
), applyingresue_at
on this input will result in Seg Fault.