daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
64 stars 53 forks source link

DNN fallback CPU ops #226

Open daphne-eu opened 2 years ago

daphne-eu commented 2 years ago

In GitLab by @corepointer on Mar 18, 2022, 11:31

We have most of our neural network operators (convolutions et al) implemented as calls to cuDNN. An initial (pooling) operator is ported from SystemDS. The other operations need a C++ implementation (or porting), test cases and ideally a comparison to the Java version.

pdamme commented 2 months ago

Motivation: Deep neural networks (DNNs), such as convolutional neural networks (CNNs), are widely used in machine learning. They consist of multiple layers, which consist of basic operations. DNN workloads are often executed on hardware accelerators (e.g., GPUs). In fact, DAPHNE offers many typical DNN operations and CUDA-based GPU kernels for them. Nevertheless, it would be helpful to have purely CPU-based implementations for the most important DNN operations, as they would allow users who don’t have a GPU set up to still play around with DNNs in DAPHNE (at least on small data).

Task: This task is to implement (in C++) kernels for at least the following typical DNN operations: convolution, pooling (max, avg) (already implemented), bias add, and batch normalization. These operations should be supported for the forward pass and the backward pass. That is, kernels for the following DaphneIR ops are expected: Conv2DForwardOp, Conv2DBackwardFilterOp, Conv2DBackwardDataOp, MaxPoolForwardOp, AvgPoolForwardOp, MaxPoolBackwardOp, AvgPoolBackwardOp, BiasAddForwardOp, BatchNorm2DForwardOp, and BatchNorm2DBackwardOp (some of these operations still need to be added to DaphneIR).

The kernels should work on DAPHNE's DenseMatrix data type. For the data layout, the typical N x C*H*W format shall be used, i.e., each of the N rows in a matrix contains the data of one image. Each image has C channels (could be, e.g., RGB) and a size of HxW pixels. In memory, all pixels in a channel are stored contiguously; within a channel, the data is stored in row-major format.

In addition to the kernels, new unit test cases should be added in test/runtime/local/kernels/ (akin to the existing ones).

Optional extensions of the task (if desired and time is left):

Hints:

pdamme commented 2 months ago

A PR/commit solving this issue should also update the DaphneDSL built-in function reference (doc/daphnedsl/Builtins.md), which currently says "Note that most of these operations only have a CUDNN-based kernel for GPU execution at the moment."