Now we have the PR about the new version of GluonNLP: https://github.com/dmlc/gluon-nlp/pull/1225, which refactors the major APIs and will rely on the DeepNumpy interface in MXNet.
Basically, we refactored the way the user will download and prepare the common NLP datasets. Previously, we will rely on python and create some XXXDataset object and access the data.
Now, we have switched to the new nlp_data + nlp_preprocess CLI commands to help you download and prepare the dataset.
Now we have the PR about the new version of GluonNLP: https://github.com/dmlc/gluon-nlp/pull/1225, which refactors the major APIs and will rely on the DeepNumpy interface in MXNet.
Basically, we refactored the way the user will download and prepare the common NLP datasets. Previously, we will rely on python and create some
XXXDataset
object and access the data.Now, we have switched to the new
nlp_data
+nlp_preprocess
CLI commands to help you download and prepare the dataset.We can enhance the dataset support by adding:
In addition, we will consider to move part of the datasets to our internal S3, which will offer fast downloading speed (if license allows).