For example, there are N PEs, each designed as a task, and their outputs are connected to N streams. Now I need to calculate the sum of their outputs. What is an efficient way to implement this?
My current idea:
Create a new task with N istreams and 1 ostream to calculate the sum in this task. However, I am not sure if connecting too many istreams to one task will cause issues.
Chain sum, each PE passes the sum of the previous pe's output and its own output to the next pe. Obviously, this will take a lot of cycles.
Is there any smarter way to implement this?
For example, there are N PEs, each designed as a task, and their outputs are connected to N streams. Now I need to calculate the sum of their outputs. What is an efficient way to implement this? My current idea: