Closed eric-haibin-lin closed 5 years ago
There's also the conversion script and the pre-trained model for Chinese.
Task for Chinese pre-trained model:
<S>
and <T>
as reserved tokens in vocab. We also need to update the fine-tuning script to use the BERTTokenizer API introduced in https://github.com/dmlc/gluon-nlp/pull/464. Anyone wants to take that?
@eric-haibin-lin, I can take a look.
What's the status for BertDetokenizer? Can't seem to find it anywhere
Looks like i missed it in the list. Created a new issue to track that: https://github.com/dmlc/gluon-nlp/issues/1047 The embedding script in https://github.com/dmlc/gluon-nlp/blob/v0.8.x/scripts/bert/embedding.py#L183-L197 has a function that kind of does de-tokenization, but not available through an API yet. @DushyantaDhyani would you like to contribute one?