etrommer / torch-approx

GPU-accelerated Neural Network layers using Approximate Multiplications for PyTorch
https://etrommer.de/torch-approx
MIT License
6 stars 3 forks source link

Implement Approximate Depthwise Convolution Kernels #1

Closed etrommer closed 1 year ago

etrommer commented 1 year ago

Benchmarking has shown that Im2Col + ApproxGeMM is extremely slow for Depthwise-Separable Convolution Operations.

This should be addressed by adding dedicated Approximate DWConv operators.

accurate FP32 DWConv operators should be used as a template.