Closed kopant closed 6 months ago
Hi @kopant, I believe the code is working as intended! Each successive layer of the MLP will have dim // division dimensionality. So if we have 1024 input dimensions, our MLP layers would go from 1024 (input) -> 256 -> 64 -> 16 -> 1 (output).
You can play around with the value to determine what works best for you as different datasets might require different numbers of layers. If you have a lot of features in your input, a higher multiple of 2 would be better. If you have fewer features, a value of 2 might be best!
I hope that answers your question but am happy to answer any follow-ups!
Closing this issue due to lack of activity. Feel free to re-open it if you have more questions!
In layer_utils.calc_mlp_dims(), should the line "dim = dim // division" instead be "dim = dim - dim // division"? Based on the documentation, that 'division' is the factor by which successive MLP layer sizes are reduced, I'd expect 'division=4' to mean that the next layer size in the MLP is reduced by a factor of 0.75, not 0.25. I also feel like this interpretation is more practical, as reducing it by a default factor of 4 seems too steep a reduction (one can easily end up with a single-layer MLP most of the time). Open to understanding more about how this design choice was made though, if you actually did intend the code as written.