cooooorn / Pytorch-XNOR-Net

XNOR-Net, with binary gemm and binary conv2d kernels, support both CPU and GPU.
BSD 3-Clause "New" or "Revised" License
82 stars 23 forks source link

Further optimize gemm #13

Open MJChku opened 3 years ago

MJChku commented 3 years ago

Thanks for your great work!

I plan to work on bnn optimization as well for various application (generative model/classifier) on a powerful cpu. I did preliminary work for a few hours to change the "micro_kernel" to use avx512, and it showed 4x speed up for simple one loop optimization (note -O3 won't do the optimization to vectorize). I wonder if you plan to work on this further ? and boost the performance further.