In this project, we implement the two-stream action recognition system on FPGA. Our model has around 10 times less operations than other C3D-based FPGA systems and can achieve near real-time requirement (10~15 FPS) while keeping similar accuracy (on UCF101 and backbone model is ResNet18).
Architecture | Accuracy | GOPs | Size(MB) | Backbone |
---|---|---|---|---|
F-C3D[1] | 79% | 76 | 321 | C3D |
F-E3D[2] | 85% | 12.2 | 8.6 | E3DNet |
Sun et al.[3] | 88% | 26.13 | 126 | (2+1)D |
Ours | 86% | 4.12 | 22.3 | ResNet18 |
We provide the pre-build bitstream and pre-trained model for quick demo. Please refer to zcu102_demo
#### run hls cnn + bitstream
> make
#### run bitstream only
> make bitstream
#### run hls cnn
> make hls