Closed fengggli closed 5 years ago
intel-caffe-thrd-1 (intelcaffe/r001hs)
intel-caffe-thrd-4 (intelcaffe/r002hs)
awnn-single-thread (awnn/r008hs)
** topdown tree
awnn-4-threads (awnn/r009hs)
After applying a few optimizations(3991d3d, 5b3e389) suggested by intel guide, single-thread time reduced from 540 to 380 ms
Now i only need to vectorize im2col and col2im using avx512,
Machine configure see: https://github.com/fengggli/gpu-computing-materials/issues/58 Note: default gcc4.8 doesn't support avx512. I built gcc7.3 am using gcc with
-march=skylake-avx512
Testing batch size 256 of resnet8 forward/backward time for 10 iterations
Notes about intel-caffe
refs