Hand-optimized FPGA Implementations of NPBench Kernels

Field Programmable Gate Arrays (FPGA) offer an alternate computation platform in addition to traditional CPU- and GPU-based systems. Unfortunately, it is difficult to fully realize the advantages of FPGAs due to programming challenges characterized by, for example, the lack of automatic memory management and programmer-transparent caches. To this end, various frameworks aim to compile code written in a high-level language such as Python to an FPGA-compatible bitstream. Evaluating these frameworks involves measuring how close to optimal the performance of an automatically generated implementation is, introducing the need for hand-optimized code as reference. This project provides such FPGA implementations of five numerical kernels taken from the NPBench benchmark suite. We experimentally evaluate our implementations using synthesis, emulation and hardware execution on a state-of-the-art Xilinx FPGA. Our implementations achieve up to 10.7x higher performance than comparable CPU- and GPU-based versions.

Benchmarks

Azimint: baseline - opt1 - opt2
Durbin: baseline - opt1 - opt2
Gram-Schmidt: baseline - opt1
Cavity Flow: baseline - opt1
Conv2D: baseline - opt1 - opt2

Building

Clone the repository (including definelicht/hlslib submodule):

git clone --recursive git@github.com:fabianlandwehr1/DPHPC-Project.git

Configure build system using CMake:

mkdir build
cd buid
cmake ..

Build and run test (example):

make TestAzimintXilinx
azimint/TestAzimintXilinx

Synthesize HLS for Xilinx:

make synthesize_azimint

fabianlandwehr1 / DPHPC-Project

readme

Hand-optimized FPGA Implementations of NPBench Kernels

Benchmarks

Building