Aayush-Ankit / puma-simulator

[ASPLOS 2019] PUMA-simulator provides a detailed simulation model of a dataflow architecture built with NVM (non-volatile memory), and runs ML models compiled using the puma compiler.
MIT License
50 stars 43 forks source link

Questions about executing VGG-16 #37

Closed yunchenlo closed 4 years ago

yunchenlo commented 4 years ago

Dear Aayush Ankit,

Thank you for your work.

I would like to ask how PUMA is capable of executing models which exceeds PUMA memory capacity.

For example, according to table 3 in PUMA paper, you have in total 4.3125 MB memristor capacity. However, typical DNNs such as VGG-16 requires 138MB or more.

How PUMA accelerator handle this condition? In addition, does the throughput simulation includes weight transfer time when executes these models?

Thank you in advance! Yun-Chen Lo

Aayush-Ankit commented 4 years ago

Thanks Yun-Chen for reading our work.

The model storage capacity (memristor) of 1 PUMA node is ~70MB = 128*128*2B (MVMU) * 2 (MVMUs per core) * 8 (cores per tile) *138 (tiles per node). Subsequently, workloads with model storage requirements > 70MB (for eg. Vgg-16) are mapped across multiple nodes over chip-to-chip interconnect - see off-chip network in Table 3.

No, our throughput simulation doesn't include configuration time aspects - weight transfer, instruction transfer.

yunchenlo commented 4 years ago

Dear Aayush Ankit,

Thank you! In addition, is the MVMU memeristor capacity 128*128*(2Byte) ? I think each cell only holds 2 bit

Yun-Chen

Aayush-Ankit commented 4 years ago

Each memristor cell holds 2 bits, and each MVMU has 8 physical memristive crossbars to represent 16 bit weights.

yunchenlo commented 4 years ago

This completes my understanding!

Thanks! Yun-Chen