dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

[BERT] Sentence Pair Classification Followups #425

Closed eric-haibin-lin closed 5 years ago

eric-haibin-lin commented 5 years ago
szha commented 5 years ago

There's also the conversion script and the pre-trained model for Chinese.

eric-haibin-lin commented 5 years ago

Task for Chinese pre-trained model:

eric-haibin-lin commented 5 years ago

We also need to update the fine-tuning script to use the BERTTokenizer API introduced in https://github.com/dmlc/gluon-nlp/pull/464. Anyone wants to take that?

Ishitori commented 5 years ago

@eric-haibin-lin, I can take a look.

DushyantaDhyani commented 4 years ago

What's the status for BertDetokenizer? Can't seem to find it anywhere

eric-haibin-lin commented 4 years ago

Looks like i missed it in the list. Created a new issue to track that: https://github.com/dmlc/gluon-nlp/issues/1047 The embedding script in https://github.com/dmlc/gluon-nlp/blob/v0.8.x/scripts/bert/embedding.py#L183-L197 has a function that kind of does de-tokenization, but not available through an API yet. @DushyantaDhyani would you like to contribute one?