Idein / py-videocore

Python library for GPGPU on Raspberry Pi
MIT License
792 stars 89 forks source link

Use VCSM for cache-enabled ARM-QPU shared buffer #28

Closed Terminus-IMRC closed 6 years ago

Terminus-IMRC commented 6 years ago

VideoCore Shared Memory (VCSM) is a service provided by the official Raspberry Pi firmware which can allocate memory on GPU and can clean/invalidate CPU cache.

This PR enables py-videocore to use cache-enabled ARM-QPU shared buffer by using VCSM. To keep cache coherency, users need to clear/invalidate cache explicitly if they enabled CPU cache. For an example usage, see https://github.com/nineties/py-videocore/commit/9c0b450d08fb5b8119dfa62e2db7715303fcceb1.

With this PR, users who belong to video group do NOT need to use sudo any longer to run QPU program by using py-videocore.

Also, this PR resolves https://github.com/nineties/py-videocore/issues/17. Pay attention to the CPU speeds:

$ python example/sgemm.py
==== sgemm example (96x363 times 363x3072) ====
threads: 12
numpy: 15.0497 sec, 0.0143 Gflops
GPU: 0.0283 sec, 7.5929 Gflops
minimum absolute error: 0.0000e+00
maximum absolute error: 9.1553e-04
minimum relative error: 0.0000e+00
maximum relative error: 1.0522e+01

$ python example/sgemm_cached.py
==== sgemm example (96x363 times 363x3072) ====
threads: 12
numpy: 2.0804 sec, 0.1033 Gflops
GPU: 0.0298 sec, 7.2091 Gflops
minimum absolute error: 0.0000e+00
maximum absolute error: 9.1553e-04
minimum relative error: 0.0000e+00
maximum relative error: 1.0522e+01
nineties commented 6 years ago

LGTM. Thank you!

Terminus-IMRC commented 6 years ago

Thanks!