Closed yunchenlo closed 4 years ago
Thanks Yun-Chen for reading our work.
The model storage capacity (memristor) of 1 PUMA node is ~70MB = 128*128*2B (MVMU) * 2 (MVMUs per core) * 8 (cores per tile) *138 (tiles per node)
. Subsequently, workloads with model storage requirements > 70MB (for eg. Vgg-16) are mapped across multiple nodes over chip-to-chip interconnect - see off-chip network in Table 3.
No, our throughput simulation doesn't include configuration time aspects - weight transfer, instruction transfer.
Dear Aayush Ankit,
Thank you!
In addition, is the MVMU memeristor capacity 128*128*(2Byte)
?
I think each cell only holds 2 bit
Yun-Chen
Each memristor cell holds 2 bits, and each MVMU has 8 physical memristive crossbars to represent 16 bit weights.
This completes my understanding!
Thanks! Yun-Chen
Dear Aayush Ankit,
Thank you for your work.
I would like to ask how PUMA is capable of executing models which exceeds PUMA memory capacity.
For example, according to table 3 in PUMA paper, you have in total 4.3125 MB memristor capacity. However, typical DNNs such as VGG-16 requires 138MB or more.
How PUMA accelerator handle this condition? In addition, does the throughput simulation includes weight transfer time when executes these models?
Thank you in advance! Yun-Chen Lo