ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.08k stars 230 forks source link

Initial porting of Op4dTensorGeneric #3404

Open novakovicdj opened 1 day ago

novakovicdj commented 1 day ago
CASE ALL SMALL (<8192) MID (8192<= & <1048576) BIG (>=1048576)
Case min max avg min max avg min max avg min max avg
ALL COMBINED 0.5848 1.8965 1.0142 0.6857 1.3725 1.006 0.7349 1.3333 1.0158 0.5848 1.8965 1.0145
NxCxHxW-NxCxHxW 0.8514 1.1985 1.013 0.9444 1.0588 1.0043 0.8514 1.1985 1.0108 0.9794 1.0542 1.0144
NxCxHxW-NxCxHx1 0.7571 1.3585 1.0189 0.9273 1.0784 1.0074 0.8496 1.3333 1.0198 0.7571 1.3585 1.0194
NxCxHxW-NxCx1xW 0.5848 1.8965 1.0219 0.9464 1.0769 1.0107 0.786 1.3304 1.0247 0.5848 1.8965 1.0216
NxCxHxW-NxCx1x1 0.8162 1.2716 1.0242 0.9388 1.0755 1.0053 0.8162 1.2716 1.0235 0.8233 1.2204 1.0266
NxCxHxW-Nx1xHxW 0.7872 1.3429 1.0183 0.9277 1.0667 1.0099 0.7872 1.3304 1.0212 0.8112 1.3429 1.0177
NxCxHxW-Nx1xHx1 0.7108 1.7501 1.0188 0.9322 1.0667 1.0042 0.768 1.2642 1.0196 0.7108 1.7501 1.0201
NxCxHxW-Nx1x1xW 0.6268 1.785 1.0138 0.9355 1.234 1.0066 0.8703 1.2718 1.0171 0.6268 1.785 1.0131
NxCxHxW-Nx1x1x1 0.7349 1.2864 1.0157 0.9231 1.0702 1.005 0.7349 1.2752 1.0166 0.8125 1.2864 1.0168
NxCxHxW-1xCxHxW 0.794 1.3025 1.0146 0.902 1.0909 1.0093 0.9028 1.2393 1.0177 0.794 1.3025 1.0137
NxCxHxW-1xCxHx1 0.7619 1.3661 1.017 0.9375 1.0889 1.0058 0.768 1.3292 1.0196 0.7619 1.3661 1.017
NxCxHxW-1xCx1xW 0.5888 1.7401 1.012 0.9074 1.234 1.0069 0.8257 1.281 1.0159 0.5888 1.7401 1.0108
NxCxHxW-1xCx1x1 0.7458 1.3443 1.0147 0.9216 1.0893 1.0047 0.8691 1.1832 1.0158 0.7458 1.3443 1.0155
NxCxHxW-1x1xHxW 0.7926 1.4048 1.0093 0.9259 1.2128 1.0055 0.8808 1.2599 1.0127 0.7926 1.4048 1.0081
NxCxHxW-1x1xHx1 0.7263 1.4353 1.0096 0.9216 1.0851 1.0038 0.8178 1.3091 1.0113 0.7263 1.4353 1.0096
NxCxHxW-1x1x1xW 0.6173 1.5395 1.0066 0.9231 1.34 1.0077 0.8946 1.1493 1.0099 0.6173 1.5395 1.0048
NxCxHxW-1x1x1x1 0.6857 1.3725 1.0092 0.6857 1.3725 1.0056 0.9686 1.0428 1.0083 0.9671 1.0477 1.0103

This table shows performance comparison between ocl and hip version of Op4dTensorGeneric kernel, it shows min, max and average speed up. It shows performance for all tensor sizes and for tensors divided into three categories: