An example: we have a receiver function that needs to read data sequentially from the streaming channel, and a sender function, as shown in the following coding block, to write data (i.e., calc_x_gradient.grad_x) into the channel.
We can apply data reuse schedule on the sender function to exploit the data locality, but this will lead to index access inconsistency between sender and receiver side (i.e., reader reads data[x + y*1024] and writer writes data[x + y*1024 -2]). To avoid the incorrectness introduced by index inconsistency, the streaming inference IR pass generates another nested loops to write data into the channel (i.e. c_buf_2.write(calc_x_gradient.grad_x[(buf_1 + (buf_0*1024))]) ). And this approach will lead to performance degradation.
As I discussed with Sean, a simple solution is to add extra if-else statement around the condition block to maintain the index access order consistency of receiver and sender.
An example: we have a receiver function that needs to read data sequentially from the streaming channel, and a sender function, as shown in the following coding block, to write data (i.e.,
calc_x_gradient.grad_x
) into the channel.We can apply data reuse schedule on the sender function to exploit the data locality, but this will lead to index access inconsistency between sender and receiver side (i.e., reader reads
data[x + y*1024]
and writer writesdata[x + y*1024 -2]
). To avoid the incorrectness introduced by index inconsistency, the streaming inference IR pass generates another nested loops to write data into the channel (i.e.c_buf_2.write(calc_x_gradient.grad_x[(buf_1 + (buf_0*1024))])
). And this approach will lead to performance degradation.As I discussed with Sean, a simple solution is to add extra if-else statement around the condition block to maintain the index access order consistency of receiver and sender.