DNN fallback CPU ops #226

Open daphne-eu opened 2 years ago

In GitLab by @corepointer on Mar 18, 2022, 11:31

We have most of our neural network operators (convolutions et al) implemented as calls to cuDNN. An initial (pooling) operator is ported from SystemDS. The other operations need a C++ implementation (or porting), test cases and ideally a comparison to the Java version.

Motivation: Deep neural networks (DNNs), such as convolutional neural networks (CNNs), are widely used in machine learning. They consist of multiple layers, which consist of basic operations. DNN workloads are often executed on hardware accelerators (e.g., GPUs). In fact, DAPHNE offers many typical DNN operations and CUDA-based GPU kernels for them. Nevertheless, it would be helpful to have purely CPU-based implementations for the most important DNN operations, as they would allow users who don’t have a GPU set up to still play around with DNNs in DAPHNE (at least on small data).

Task: This task is to implement (in C++) kernels for at least the following typical DNN operations: convolution, pooling (max, avg) (already implemented), bias add, and batch normalization. These operations should be supported for the forward pass and the backward pass. That is, kernels for the following DaphneIR ops are expected: Conv2DForwardOp, Conv2DBackwardFilterOp, Conv2DBackwardDataOp, MaxPoolForwardOp, AvgPoolForwardOp, MaxPoolBackwardOp, AvgPoolBackwardOp, BiasAddForwardOp, BatchNorm2DForwardOp, and BatchNorm2DBackwardOp (some of these operations still need to be added to DaphneIR).

The kernels should work on DAPHNE's DenseMatrix data type. For the data layout, the typical N x C*H*W format shall be used, i.e., each of the N rows in a matrix contains the data of one image. Each image has C channels (could be, e.g., RGB) and a size of HxW pixels. In memory, all pixels in a channel are stored contiguously; within a channel, the data is stored in row-major format.

In addition to the kernels, new unit test cases should be added in test/runtime/local/kernels/ (akin to the existing ones).

Optional extensions of the task (if desired and time is left):

Implement a small DNN example (inference and/or training) in DaphneDSL (e.g., as a script-level test case in test/api/cli/).
Support the DNN ops in DAPHNE's vectorized engine (for parallelism and cache-consciousness).
Implement specializations of the DNN ops' kernels for CSRMatrix (sparse matrices).

Hints:

Get familiar with DaphneIR's DNN ops in src/ir/daphneir/DaphneOps.td (search for Deep neural network).
Read the guidelines on implementing a built-in kernel.
For those operations that already have a CUDA-based implementation, please use the same interface as the CUDA variants (see src/runtime/local/kernels/CUDA/). For the others, take inspiration from the existing ones.
The kernels should not utilize multi-threading internally (DAPHNE's vectorized engine would do that from outside).
Inspiration can be taken from the corresponding operations in Apache SystemDS, both in terms of the documentation and their implementation. It is permissible to translate the core parts of the implementations in SystemDS (whereby these can be significantly simplified due to the focus on dense data and single-threaded execution in this task).
It is a good idea to compare the outputs of the new kernels to those of SystemDS (can be tried by writing small DML scripts).
Make sure to register any new test case cpp-files in test/CMakeLists.txt.

A PR/commit solving this issue should also update the DaphneDSL built-in function reference (doc/daphnedsl/Builtins.md), which currently says "Note that most of these operations only have a CUDNN-based kernel for GPU execution at the moment."

daphne-eu / daphne

DNN fallback CPU ops #226