Closed millardjn closed 8 years ago
Hi, I'm very interested in seeing your code either way. I use SNB too, so I can't give you much more testing.
Threading is in scope for this library, but I'm unsure if rayon is a good fit. We're in a good situation to simply set up a fixed number of threads.
Closing since this is inactive (an inactivity that was started by me, I'm sorry!)
Hi Bluss,
I've fiddled with library a bit and managed to get a ~25% performance boost on sandy bridge. I had to rearrange things a bit to get it to work (llvm is truly capricious) so I thought I'd see if it works for other setups before sending a PR.
I was also thinking that the ~b packing could be combined into a single step with im2col for reasonably fast low memory convolutions. I might give it a go some time soon along with rayon multithreading. Any thoughts on the intended scope for the library?