Script for downloading and pre-processing wikitext datasets

NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

https://nvidia.github.io/OpenSeq2Seq

Apache License 2.0

1.54k stars 369 forks source link

Script for downloading and pre-processing wikitext datasets #494

Closed gioannides closed 5 years ago

gioannides commented 5 years ago

Added an automated script for downloading and pre-processing wikitext-103 and wikitext-2 datasets for the LSTM language models, in the format described in https://nvidia.github.io/OpenSeq2Seq/html/language-model.html. This is very helpful for generating the files needed for training LSTM language models.

vsl9 commented 5 years ago

Thank you for the PR.