NVIDIA / sentiment-discovery

Unsupervised Language Modeling at scale for robust sentiment classification
Other
1.06k stars 202 forks source link

Add MoS (Mixture of Softmax) option for next-char prediction #11

Open moscow25 opened 6 years ago

moscow25 commented 6 years ago

http://smerity.com/articles/2017/mixture_of_softmaxes.html

I implemented it. Does not noticeably help for BPC or for sentiment transfer on 1024 RNN hidden size. But we should share the code anyway. Perhaps MoS would be more helpful for a larger output softmax, like word parts [characters are only softmax size 64]. Might as well have that in there.

It does increase memory usage especially for large hidden states.