Open DoktorMike opened 2 years ago
Like a 5 million parameter fully connected feed forward network has no speed gains while a 5000 parameter network does. I'm making up numbers here but I hope you get the gist.
The largest model I ever tried was LeNET*MNIST. At just over 44k parameters, it is a far cry from your 5 million, so I have no idea how it'd perform for a model that large. Benchmark results for MNIST were shared here, where it (at that size) still did substantially better than the competition on the CPU: https://julialang.org/blog/2022/04/simple-chains/ and was still competitive with Flux + a very beefy GPU.
But with 5 million parameters, you're almost certainly better off on a GPU, which isn't supported by SimpleChains at the moment.
If you want GPU support, I'd also suggest taking a look at Lux.jl: https://github.com/avik-pal/Lux.jl
As a user coming to this repo you are struck by the awesome speedup compared to Flux, but still you are left wondering what "small" really means. In the example the network is indeed small. There is probably some limit to the number of parameters and/or depth/width of a network that we can state?
Like a 5 million parameter fully connected feed forward network has no speed gains while a 5000 parameter network does. I'm making up numbers here but I hope you get the gist.
I think guidelines, and I know it's hard, like this would help new users to evaluate if they should use SimpleChains or Flux for their problem.