Closed hyunjongL closed 1 year ago
Thanks, this is the part I am having the most difficulty with understanding this board. I have been measuring latency for different number of channels. One thing I noticed is that each channel introduces an additional computation overhead (20us, but not sure if this is model/data dependent), not just the time for data write. Is there a control or synchronization cost for each processors that happen sequentially? For example, does loading the next weights, or configuring processors (layer info, TRAM reset, ...) happen in sequence?
Configuring, and loading data and weights happens sequentially on the selected clock. For the inference time, I'm going to send you a (simplified) piece of code that calculates the number of cycle for simple cases (i.e., no streaming, no element-wise operations). We'll eventually add this to the toolset, but it's not quite ready for that yet. Let me know where you want me to email the code.
Please send it to hyunjongl@kaist.ac.kr . Thanks!
There is a dark blue aggregator and three light blue aggregators.
Many thanks!