Closed None403 closed 4 years ago
It should be automatically downloaded from the server with the following three lines.
config = AutoConfig.from_pretrained(params_senteval["model_type"], cache_dir='./cache')
config.output_hidden_states = True
tokenizer = AutoTokenizer.from_pretrained(params_senteval["model_type"], cache_dir='./cache')
model = AutoModelWithLMHead.from_pretrained(params_senteval["model_type"], config=config, cache_dir='./cache')
Can you paste your error here so I can debug? Have you created a folder named 'cache' in the main directory?
The script I ran is example2.sh. It seems that other errors occurred. The following is all output:
root@0cb791f65a07:~/SBERT-WK-Sentence-Embedding# ./example2.sh --model_type binwang/bert-base-nli --model_type binwang/bert-base-uncased --embed_method dissecting --max_seq_length 64 --batch_size 1 --context_window_size 2 --layer_start 4 --tasks sts
2020-03-05 09:35:19,095 : Starting new HTTPS connection (1): s3.amazonaws.com:443 2020-03-05 09:35:20,269 : https://s3.amazonaws.com:443 "HEAD /models.huggingface.co/bert/binwang/bert-base-uncased/config.json HTTP/1.1" 200 0 2020-03-05 09:35:20,284 : loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/binwang/bert-base-uncased/config.json from cache at ./cache/19e969ebebc46506a7d80830232146353b99b1f30bff8aff6e115d2dcbcc4afd.913dd763a263b43d0803c5b4cd8e6810e129f390793e910ba19e547a266e6b6f 2020-03-05 09:35:20,286 : Model config { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "do_sample": false, "eos_token_ids": 0, "finetuning_task": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "intermediate_size": 3072, "is_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_eps": 1e-12, "length_penalty": 1.0, "max_length": 20, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_beams": 1, "num_hidden_layers": 12, "num_labels": 2, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": true, "output_past": true, "pad_token_id": 0, "pruned_heads": {}, "repetition_penalty": 1.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "torchscript": false, "type_vocab_size": 2, "use_bfloat16": false, "vocab_size": 30522 }
2020-03-05 09:35:20,288 : Model name 'binwang/bert-base-uncased' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). Assuming 'binwang/bert-base-uncased' is a path or url to a directory containing tokenizer files.
2020-03-05 09:35:20,288 : Didn't find file binwang/bert-base-uncased/added_tokens.json. We won't load it.
2020-03-05 09:35:20,288 : Didn't find file binwang/bert-base-uncased/special_tokens_map.json. We won't load it.
2020-03-05 09:35:20,288 : Didn't find file binwang/bert-base-uncased/tokenizer_config.json. We won't load it.
2020-03-05 09:35:20,290 : Starting new HTTPS connection (1): s3.amazonaws.com:443
2020-03-05 09:35:21,521 : https://s3.amazonaws.com:443 "HEAD /models.huggingface.co/bert/binwang/bert-base-uncased/vocab.txt HTTP/1.1" 200 0
2020-03-05 09:35:21,528 : loading file https://s3.amazonaws.com/models.huggingface.co/bert/binwang/bert-base-uncased/vocab.txt from cache at ./cache/2c727aa1d252b261a4f15e04ad1beec8403f40d2eab4fbe998f1ae804b522b06.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
2020-03-05 09:35:21,530 : loading file None
2020-03-05 09:35:21,530 : loading file None
2020-03-05 09:35:21,530 : loading file None
2020-03-05 09:35:21,801 : Starting new HTTPS connection (1): s3.amazonaws.com:443
2020-03-05 09:35:22,963 : https://s3.amazonaws.com:443 "HEAD /models.huggingface.co/bert/binwang/bert-base-uncased/pytorch_model.bin HTTP/1.1" 200 0
2020-03-05 09:35:22,971 : loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/binwang/bert-base-uncased/pytorch_model.bin from cache at ./cache/c1c9f3dafc802586d46285b1383200b5747305ab65e3c92b1a83c18ff82a1b37.e20ed098e5dc4a7be382b8fb2b1438a2271c71d5328590f86d29b64f2c0b23ac
2020-03-05 09:35:29,571 : Weights from pretrained model not used in BertForMaskedLM: ['cls.predictions.decoder.bias']
Traceback (most recent call last):
File "sen_emb.py", line 67, in
Hello, my friend! These three files can be downloaded successfully,respectively, pytorch_model.bin && config.json && vocab.txt. In addition, the following three files seems failed to download, if convenient, can you upload it again? Thanks!!(#^.^#): added_tokens.json && special_tokens_map.json && tokenizer_config.json
Hi @None403, Thanks for posting the results here.
Your error comes from the GPU memory. I would recommend using a GPU with at least 6GB memory. Or change to setting to CPU. BERT is not a small-sized model. So, it acquires more GPU memory usage.
If you want it working on CPU, which may be slow, you may consider changing the ones XX.to("CUDA") with XX.to("CPU"). Or simply remove it.
Solved.
Hello, when I was running, it was suggested that three files could not be found. Could you please provide the download link? The three documents are as follows : added_tokens.json、special_tokens_map.json、tokenizer_config.json