Closed etrommer closed 2 years ago
Benchmarking has shown that Im2Col + ApproxGeMM is extremely slow for Depthwise-Separable Convolution Operations.
This should be addressed by adding dedicated Approximate DWConv operators.
accurate FP32 DWConv operators should be used as a template.
Benchmarking has shown that Im2Col + ApproxGeMM is extremely slow for Depthwise-Separable Convolution Operations.
This should be addressed by adding dedicated Approximate DWConv operators.
accurate FP32 DWConv operators should be used as a template.