Closed szha closed 4 years ago
I can make room for this topic.
Shall we consider LAMB optimizer - https://arxiv.org/abs/1904.00962?
@szha: #677
Up to now, gluonnlp has basically perfect components. There are many specific models for the specific tasks of natural language processing. Can we write some standard classic models? Anyone with a related task requirement can call this model directly for a quick experiment.
Distributed training module may be a good feature as many current SOTA nlp models typically require a lot of GPUs. pytorch/fairseq also supports distributed training across multiple machines.
I think we can add some text matching models #616
JSON/Yaml config files that specify models and experiments would be great. That's a very popular feature of AllenNLP and other toolkits.
https://github.com/dmlc/gluon-nlp/issues/392
For an example AllenNLP config file, see https://github.com/allenai/allennlp/blob/master/tutorials/tagger/experiment.jsonnet
See also their tutorial: https://github.com/allenai/allennlp/blob/master/tutorials/tagger/README.md#using-config-files
Quote:
This means that most of your experiment can be specified declaratively in a separate configuration file, which serves as a record of exactly what experiments you ran with which parameters. Now you can change various aspects of your model without writing any code. For instance, if you wanted to use a GRU instead of an LSTM, you'd just need to change the appropriate entry in the configuration file.
Add "Deep Relevance Matching Model" for relevance matching Here is one implementation in keras: https://github.com/sebastian-hofstaetter/neural-ranking-drmm
Add Simple Recurrent Unit (SRU) https://arxiv.org/pdf/1709.02755.pdf
ERNIE: BERT with Knowledge pretraining: https://arxiv.org/pdf/1904.09223.pdf
Topic models like LDA Lemmatization and Stemming for preprocessing Expose sparse n gram representation or BOW representation for sentences/documents to user.
MASS: MASS: Masked Sequence to Sequence Pre-training for Language Generation https://arxiv.org/pdf/1905.02450.pdf
Pay Less Attention With Lightweight and Dynamic Convolutions https://arxiv.org/pdf/1901.10430.pdf
Poincaré Word Embedding: https://arxiv.org/pdf/1705.08039.pdf
GPT-2 Training from Scratch
Transformer/BERT visualization: [Arxiv2019] Visualizing and Measuring the Geometry of BERT [ICML2017] Understanding Black-box Predictions via Influence Functions
Memory networks:
Adaptive softmax and adaptive embedding https://arxiv.org/pdf/1809.10853.pdf
Adaptive softmax and embedding is already part of scripts/language_model
. Let's clarify the roadmap as: move support to main API.
pointer mechanism for attention https://github.com/dmlc/gluon-nlp/issues/951
I can try tinybert.
We are working on the numpy version of GluonNLP and will adjust the positioning of this package. We will consider the roadmap items here on related areas as well as more recent research.
Hi,
Let's start a discussion here about the roadmap towards 0.10 and 1.0. We are looking for:
If you have any item that you'd like to propose to have in the roadmap, please do:
Features
The following features have been proposed to include in GluonNLP 0.7.0 (subject to change)
Models
Language modeling
Word Embedding
NER
Memory networks and transformer
Visualization
Quantization
Multi-Task/Transfer Learning
Machine translation
Tokenization
Text classification
Topic modeling
Knowledge distillation
APIs
Scripts
Documentation
Demos
cc @dmlc/gluon-nlp-team
Related Projects