Closed yvokeller closed 3 months ago
With factor 4, this approach reduces the neck size significantly!
from 84,953,088 trainable params down to 10’641’024.
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
With factor 4, this approach reduces the neck size significantly!
from 84,953,088 trainable params down to 10’641’024.