artyom-beilis / dlprimitives

Deep Learning Primitives and Mini-Framework for OpenCL
http://blog.dlprimitives.org/
MIT License
169 stars 16 forks source link
convolutional-neural-networks deep-learning deep-neural-networks gpu gpu-computing open-standard opencl

DLPrimitives

This project aims to provide cross platform OpenCL tools for deep learning and inference.

Today, most of deep learning training is done on NVidia GPUs using closed source CUDA and CUDNN libraries. It is either challenging or virtually impossible to use AMD or Intel GPUs. For example: AMD provides ROCm platform, but there is no support of RDNA platforms yet (more than a year since a release), there is no support of APUs and no support of any operating systems other than Linux.

Goals

Please note this is only work in progress - first and preliminary stages.

Initial Framework Integration

Integration with existing frameworks:

Integration With ONNX

ONNX Model loading and inference tested on following imagenet networks:

Documentation

Is published under http://dlprimitives.org/docs/

Features Matrix

Operator Features Comment
Softmax Softmax, LogSoftmax
NLLLoss
MSELoss
SoftmaxWithLoss
Elementwise ax+by, max(ax,by), ax*y, broadcasting
Concat
Slice
Pooling2D max, average
GlobalPooling max, average 2D only
GlobalAvgPool2d
InnerProduct
BatchNorm
Reshape
Squeeze
Flatten
Threshold
Hardtanh
Abs
Parameter ֹUtility
Reduction Sum, Mean, Sum Squares, L1
Convolution2D GEMM, Winograd, Depthwise Separable
TransposedConvolution2D GEMM, Winograd, Depthwise Separable
Activation relu, sigmoid, tanh, relu6

Solvers: SGD, Adam

Tested GPUs

Device Vendor Notes
RX 6600XT AMD ROCr
RX 560 AMD 16cu model, ROCm, PAL, Clover
HD 530 Intel i5-6600, NEO driver
GTX 960 NVidia
GTX 1080 NVidia
RTX 2060S NVidia
MaliG52 MC2 ARM performance not optimised yet
M1 Max Apple 32-core model

Devices Tested on Windows: AMD RX 560, NVidia GTX 960.

Devices Tested on macOS: Apple M1 Max.

Other features