-
You have this cool function by dividing by integers by using bitshift, and first multiplying by another number, so you're not limited to dividing by powers of 2, as described in https://gmplib.org/~te…
-
Here are a set of more accurate transforms for the Winograd F(6x6,3x3) implementation. By my tests they are 100X more numerically stable than the old transforms (from the first draft of my paper), and…
-
Hi,
How to cite winograd kernels?
-
I would like to calculate utilization, but I can't find AVX2 frequency for i7-6700K. For example, if AVX2 frequency were 4.0 GHz (which it isn't) max FLOP/s would be:
32 FLOP/clock \* 4.0GHz \* 4core…
-
Hi Fabian @naibaf7 how to plug your convolutional layer into other frameworks? What format does it expect the tensor data to be in? Can I just send in a cl_mem object in a certain format? Presumabl…
-
Besides the #3 , I have a question on matrix multiply. What is its performance compared to the hardcoded library functions? If it is something like a third, then it might be possible to speed it up by…