AILab-CVC / UniRepLKNet

[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
https://arxiv.org/abs/2311.15599
Apache License 2.0
863 stars 52 forks source link

depthwise conv 似乎没加速 #7

Closed ZhuShengchen closed 6 months ago

ZhuShengchen commented 6 months ago

我使用了您提供的cutlass depthwise 做加速,在输入为(1536,90,180)上似乎比torch 的depthwise conv还要慢2-4倍,显卡是A100,驱动是11.8,请问是哪方面的问题?

DingXiaoH commented 6 months ago

batch size非常小的时候cutlass depthwise速度不如nn.Conv2d。我会测一下不同batch size的速度对比。一会吃完饭就测。