Closed dmitrytyrin closed 4 years ago
@dmitrytyrin could you try the punctuation notebook and use PRETRAINED_BERT_MODEL = "bert-base-multilingual-uncased"?
@ekmb, thank you, it works!
How can I perform only inference with pretrained bert? How can I get punct_label_ids/capit_label_ids if I don't have train_data_layer?
You should train the model to do inference, beside the pretrained bert part there are 2 token classification heads that need to be trained.
Dear team, I try to predict punctuation following this tutorial: https://nvidia.github.io/NeMo/nlp/punctuation.html. I can't define tokenizer using pretrained "bert-base-multilingual-uncased" model.
tokenizer = nemo.collections.nlp.data.NemoBertTokenizer(pretrained_model="bert-base-multilingual-uncased")
gives me error:I manually downloaded BERT from https://github.com/google-research/bert/blob/master/multilingual.md and tried to use absolute path in
pretrained_model_name
. (Folder "multilingual_L-12_H-768_A-12" containes config.json, model.data, model.index, model.meta and vocab.txt) Tokenizer gives me:How can I solve my issue?