dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.55k stars 538 forks source link

[Discussion] Roadmap #654

Closed szha closed 4 years ago

szha commented 5 years ago

Hi,

Let's start a discussion here about the roadmap towards 0.10 and 1.0. We are looking for:

If you have any item that you'd like to propose to have in the roadmap, please do:

Features

The following features have been proposed to include in GluonNLP 0.7.0 (subject to change)

Models

Language modeling

Word Embedding

NER

Memory networks and transformer

Visualization

Quantization

Multi-Task/Transfer Learning

Machine translation

Tokenization

Text classification

Topic modeling

Knowledge distillation

APIs

Scripts

Documentation

Demos

cc @dmlc/gluon-nlp-team

Related Projects

haven-jeon commented 5 years ago

633 MTL(MultiTask Learning) is one of the most important topic in NLP. Implementing BigBird is good start point to develop MTL in gluon-nlp.

I can make room for this topic.

Ishitori commented 5 years ago

Shall we consider LAMB optimizer - https://arxiv.org/abs/1904.00962?

@szha: #677

szha commented 5 years ago

657 BERT for XNLI/NMT. @fhieber and team expressed interest in this.

vanewu commented 5 years ago

Up to now, gluonnlp has basically perfect components. There are many specific models for the specific tasks of natural language processing. Can we write some standard classic models? Anyone with a related task requirement can call this model directly for a quick experiment.

szhengac commented 5 years ago

Distributed training module may be a good feature as many current SOTA nlp models typically require a lot of GPUs. pytorch/fairseq also supports distributed training across multiple machines.

fierceX commented 5 years ago

I think we can add some text matching models #616

sravanbabuiitm commented 5 years ago
  1. We can look towards adding HAN (Heirarchical Attention Models for Document classification : https://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf) in scripts for text classification.
  2. We can add HAN model and fastText models to the d2l.ai book
markusdr commented 5 years ago

JSON/Yaml config files that specify models and experiments would be great. That's a very popular feature of AllenNLP and other toolkits.

https://github.com/dmlc/gluon-nlp/issues/392

For an example AllenNLP config file, see https://github.com/allenai/allennlp/blob/master/tutorials/tagger/experiment.jsonnet

See also their tutorial: https://github.com/allenai/allennlp/blob/master/tutorials/tagger/README.md#using-config-files

Quote:

This means that most of your experiment can be specified declaratively in a separate configuration file, which serves as a record of exactly what experiments you ran with which parameters. Now you can change various aspects of your model without writing any code. For instance, if you wanted to use a GRU instead of an LSTM, you'd just need to change the appropriate entry in the configuration file.

eric-haibin-lin commented 5 years ago

Add "Deep Relevance Matching Model" for relevance matching Here is one implementation in keras: https://github.com/sebastian-hofstaetter/neural-ranking-drmm

eric-haibin-lin commented 5 years ago

Add Simple Recurrent Unit (SRU) https://arxiv.org/pdf/1709.02755.pdf

eric-haibin-lin commented 5 years ago

ERNIE: BERT with Knowledge pretraining: https://arxiv.org/pdf/1904.09223.pdf

eric-haibin-lin commented 4 years ago

Topic models like LDA Lemmatization and Stemming for preprocessing Expose sparse n gram representation or BOW representation for sentences/documents to user.

https://github.com/dmlc/gluon-nlp/issues/822

eric-haibin-lin commented 4 years ago

MASS: MASS: Masked Sequence to Sequence Pre-training for Language Generation https://arxiv.org/pdf/1905.02450.pdf

eric-haibin-lin commented 4 years ago

Pay Less Attention With Lightweight and Dynamic Convolutions https://arxiv.org/pdf/1901.10430.pdf

eric-haibin-lin commented 4 years ago

Poincaré Word Embedding: https://arxiv.org/pdf/1705.08039.pdf

eric-haibin-lin commented 4 years ago

GPT-2 Training from Scratch

eric-haibin-lin commented 4 years ago

Transformer/BERT visualization: [Arxiv2019] Visualizing and Measuring the Geometry of BERT [ICML2017] Understanding Black-box Predictions via Influence Functions

eric-haibin-lin commented 4 years ago

Memory networks:

sundeepteki commented 4 years ago
eric-haibin-lin commented 4 years ago

Adaptive softmax and adaptive embedding https://arxiv.org/pdf/1809.10853.pdf

leezu commented 4 years ago

Adaptive softmax and embedding is already part of scripts/language_model. Let's clarify the roadmap as: move support to main API.

eric-haibin-lin commented 4 years ago

pointer mechanism for attention https://github.com/dmlc/gluon-nlp/issues/951

eric-haibin-lin commented 4 years ago

ALBERT: https://github.com/google-research/google-research/tree/master/albert

eric-haibin-lin commented 4 years ago

XLM-Roberta: https://github.com/pytorch/fairseq/tree/master/examples/xlmr

fierceX commented 4 years ago

I can try tinybert.

szha commented 4 years ago

We are working on the numpy version of GluonNLP and will adjust the positioning of this package. We will consider the roadmap items here on related areas as well as more recent research.