Closed TheArowanaDude closed 4 years ago
Just to clarify, you'd like to use a fine-tuned BERT model (fine-tuned via huggingface's pytorch-transformers) in allennlp's BERT token embedder? If this is the case, providing the directory where the BERT model was serialized as the model name should work. Can you provide the snippet of your BERT token embedder/indexer configuration here?
Just to clarify, you'd like to use a fine-tuned BERT model (fine-tuned via huggingface's pytorch-transformers) in allennlp's BERT token embedder? If this is the case, providing the directory where the BERT model was serialized as the model name should work. Can you provide the snippet of your BERT token embedder/indexer configuration here?
Yes! I followed this template https://gist.github.com/joelgrus/7cdb8fb2d81483a8d9ca121d9c617514
"token_indexers": { "bert": { "type": "bert-pretrained", "pretrained_model": "bert-large-cased-vocab.txt", "do_lowercase": false, "use_starting_offsets": true },
"token_embedders": { "bert": { "type": "bert-pretrained", "pretrained_model": "wwm_cased_L-24_H-1024_A-16" },
The values to both "pretrained_model"
keys should be the absolute path to the serialization directory of the fine-tuned BERT model (or one of the default BERT model names).
The values to both
"pretrained_model"
keys should be the absolute path to the serialization directory of the fine-tuned BERT model (or one of the default BERT model names).
Ah okay, should I edit the config file in serialized model? I tried to unzip and modify the config.json and re-zip but it gave me another error:
FileNotFoundError: file /tmp/tmpj8o6jszw/config.json not found
What does your serialization directory contents look like? It should contain unzipped config.json, vocab.txt, and pytorch_model.bin files, you shouldn't need to edit the config file.
What does your serialization directory contents look like? It should contain unzipped config.json, vocab.txt, and pytorch_model.bin files, you shouldn't need to edit the config file.
It contains all those files, it contains bert_config.json, vocab.txt, and pytorch_model.bin; which confuses me as I was able to train successfully.
please provide a full stack trace of the error you are getting (as requested in the issue template).
I assumed it was a path or url problem. You can unzip the file and edit the configuration file to make sure that your path is correct. It works for me.
I have a solution for this issue which worked for me:
If you have to unzip the archive, edit the config file and rezip it using tar -zcvf
it failed for me too.
Instead of zipping using tar -zcvf
, do the following:
from allennlp.models.archival import archive_model
archive_model(serialization_dir='path/to/model_tar_dir', archive_path='path/to/model.tar.gz', weights='name_of_your_weights_file.th')
This helps load the archive without any errors like FileNotFoundError: file /tmp/tmpj8o6jszw/config.json not found
Hope this helps!
Closing due to inactivity
Sorry for mentioning when the topic is closed, but I'm having trouble interpreting the model. I have a trained model for NER task. I have used "xlm-roberta-base", and I have got file weight model.pt. I want to interpret my model using AllenNLP, I can see on guideline as below, but I have no idea about archive.
from allennlp.interpret.saliency_interpreters import SimpleGradient from allennlp.predictors import Predictor inputs = {"sentence": "a very well-made, funny and entertaining picture."} archive = ( "https://storage.googleapis.com/allennlp-public-models/basic_stanford_sentiment_treebank-2020.06.09.tar.gz" ) predictor = Predictor.from_path(archive) interpreter = SimpleGradient(predictor) interpretation = interpreter.saliency_interpret_from_json(inputs) print(interpretation)
Looking forward to everyone's help, thanks in advance!
Hi, I successfully trained my own bert model but when I tried loading the model via python interface and I got this error:
I managed to successfully train the model, I just don't understand why it's failing to load now. Would greatly appreciate guidance and help on this!