georgian-io / Multimodal-Toolkit

Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
https://multimodal-toolkit.readthedocs.io
Apache License 2.0
587 stars 84 forks source link

layer_utils.calc_mlp_dims() bug? #68

Closed kopant closed 6 months ago

kopant commented 8 months ago

In layer_utils.calc_mlp_dims(), should the line "dim = dim // division" instead be "dim = dim - dim // division"? Based on the documentation, that 'division' is the factor by which successive MLP layer sizes are reduced, I'd expect 'division=4' to mean that the next layer size in the MLP is reduced by a factor of 0.75, not 0.25. I also feel like this interpretation is more practical, as reducing it by a default factor of 4 seems too steep a reduction (one can easily end up with a single-layer MLP most of the time). Open to understanding more about how this design choice was made though, if you actually did intend the code as written.

akashsaravanan-georgian commented 8 months ago

Hi @kopant, I believe the code is working as intended! Each successive layer of the MLP will have dim // division dimensionality. So if we have 1024 input dimensions, our MLP layers would go from 1024 (input) -> 256 -> 64 -> 16 -> 1 (output).

You can play around with the value to determine what works best for you as different datasets might require different numbers of layers. If you have a lot of features in your input, a higher multiple of 2 would be better. If you have fewer features, a value of 2 might be best!

I hope that answers your question but am happy to answer any follow-ups!

akashsaravanan-georgian commented 6 months ago

Closing this issue due to lack of activity. Feel free to re-open it if you have more questions!