doe300 / VC4CL

OpenCL implementation running on the VideoCore IV GPU of the Raspberry Pi models
MIT License
726 stars 80 forks source link
c-plus-plus opencl raspberry videocore-iv

Status

CircleCI

NOTE: VC4CL will NOT work with Raspberry Pi 4, since it has an incompatible GPU!

VC4CL

VC4CL is an implementation of the OpenCL 1.2 standard for the VideoCore IV GPU (found in Raspberry Pi 1 - 3 models).

The implementation consists of:

OpenCL-Support

The VC4CL implementation supports the EMBEDDED PROFILE of the OpenCL standard version 1.2. Additionally the cl_khr_icd extension is supported, to allow VC4CL to be found by an installable client driver loader (ICD). This enables VC4CL to be used in parallel with another OpenCL implementation, e.g. pocl, which executes OpenCL code on the host CPU.

The OpenCL version 1.2 was selected as target standard version, since it is the last version of the OpenCL standard where all mandatory features can be supported.

VC4CL supports the EMBEDDED PROFILE of the OpenCL-standard, which is a trimmed version of the default FULL PROFILE. The most notable features, which are not supported by the VC4CL implementation are images, the long and double data-types, device-side printf and partitioning devices. See RuntimeLibrary for more details of (not) supported features.

VideoCore IV GPU

The VideoCore IV GPU, in the configuration as found in the Raspberry Pi models, has a theoretical maximum performance of 24 GPFLOS and is therefore very powerful in comparison to the host CPU. The GPU (which is located on the same chip as the CPU) has 12 cores, able of running independent instructions each, supports a SIMD vector-width of 16 elements natively and can access the RAM directly via DMA.

Required software

Build

The following configuration options are available in CMake:

Khronos ICD Loader

The Khronos ICD Loaders allows multiple OpenCL implementation to be used in parallel (e.g. VC4CL and pocl), but requires a bit of manual configuration: Create a file /etc/OpenCL/vendors/VC4CL.icd with a single line containing the absolute path to the VC4CL library.

The program clinfo can be used to test, whether the ICD loader finds the VC4CL implementation. Note: the program version in the official Raspbian repository is too old and has a bug (see fix), so it must be compiled from the github repository.

Security Considerations

Because of the DMA-interface which has no MMU between the GPU and the RAM, code executed on the GPU can access any part of the main memory! This means, an OpenCL kernel could be used to read sensitive data or write into kernel memory!

Depending on the configuration used for the VC4CL (see Experimental Features below), the process using the VC4CL library needs to be either root (e.g. via sudo <program>) or be in the video group). The v3d_info and v3d_profiling tools in this project need to be run as root to give the maximum amount of information.

Debug

Since this software is still in development, some functionality might not work. For curious users or to be able to provide more information for bugs, additional debug information can be generated if desired.

To generate debug information, set the VC4CL_DEBUG environment variable to one (or multiple, separated by commas) of the following strings:

Experimental features

Mostly for development, performance comparison and debugging purposes, the system interfaces used for specific system accesses can be selected via following environment variables: