X86_64 back-end Kernel Implementation details

Hi, I want to know how nnpack implement the kernels used in convolution inference. By reading the codes, I find the computation is taken by kernels in src/x86_64. For examples, the direct convolution is computed by the kernel in src/x86_64/blas/conv1x1.py, which is built on the peachpy framework(if my understanding is right). Most of the codes build on peachypy is straightforward and easy to understand, but some of them are hard. So I'd like to know if there are documents about the peachpy or the kernel implementation details. Tanks.

Maratyszcza / NNPACK

X86_64 back-end Kernel Implementation details #147