"Tiny LSTM" model used in recent ACL SyntaxGym papers. Based on PyTorch sample LSTM implementation. Relatively shallow stacked LSTM with dropout.
Are you the creator/co-creator of this language model? No :)
Are you the creator/co-creator of this implementation of this language
model? Yes
Is this implementation the official implementation of the language model? No
What licensing restrictions (if any) apply to this implementation of this
language model? None
Training
What corpus was this model trained on? BLLIP-LG as described in Hu et al 2020. 1.8M sentences, 42M tokens, 170K non-UNK types, 74 fine-grained UNK types.
What task was this model trained on? next-word prediction
Complexity:
2 layers
256 units per hidden layer
256 embedding units
Performance: 57.09 perplexity on held-out BLLIP-LG from Hu et al 2020.
Other notes
NB tests will probably fail, since this model supports the mount_checkpoint feature in advance of that feature being available on develop.
Model
"Tiny LSTM" model used in recent ACL SyntaxGym papers. Based on PyTorch sample LSTM implementation. Relatively shallow stacked LSTM with dropout.
Training
Complexity:
Performance: 57.09 perplexity on held-out BLLIP-LG from Hu et al 2020.
Other notes
NB tests will probably fail, since this model supports the
mount_checkpoint
feature in advance of that feature being available ondevelop
.Licensing
MIT