Errors while trying running the model

shahard92 commented 4 years ago

Hi,

I'm trying to run the model you published on git and i'm getting on one machine the following error:

RuntimeError: CUDA out of memory. Tried to allocate 90.00 MiB (GPU 0; 6.00 GiB total capacity; 4.17 GiB already allocated; 86.27 MiB free; 330.21 MiB cached)

I tried to reduce batch size (even to 1) and make --max_seq_length = 10, and I still get this error exactly after 9 epocs.

FYI: I'm running the following command:

python run_classifier.py --data_dir glue_data/MNLI/ --eval_batch_size 1 --max_seq_length 10 --bert_model bert-base-uncased --do_lower_case --task_name mnli --do_train --do_eval --do_predict --output_dir glue/base_mnli --learning_rate 3e-5 --num_train_epochs 200

on other machine i always get this error before training:

: 'NoneType' object has no attribute 'tokenize'pytorch_pretrained_bert.tokenization - Model name 'bert-base-uncased' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt' was a path or url but couldn't find any file associated to this path or url.

I guess that you encountered these errors, so i try to contact you and see if you can help me solve them and run the model successfully.

I'll appreciate your help.

Thanks, Shahar

shahard92 commented 4 years ago

Hi,

I solved the issue by adding after "del predict_model": gc.collect() torch.cuda.empty_cache()

Now i just get bad results when training and evaluating on MNLI, even after ~80 epochs i still get around 0.333 accuracy on evaluation.

these are the parameters i run with:

python run_classifier.py --data_dir glue_data/MNLI/ --max_seq_length 200 --bert_model bert-base-uncased --do_lower_case --task_name mnli --do_train --do_eval --output_dir glue/base_mnli --learning_rate 3e-5 --num_train_epochs 300 --train_batch_size 8 --eval_batch_size 32

I'll appreciate your help to understand why I get such bad evaluation. (the loss does go does to nearly zero on training set)

02/23/2020 15:11:21 - INFO - main - Epoch: 79, eval_accuracy = 0.27450980392156865 02/23/2020 15:11:21 - INFO - main - Epoch: 79, eval_loss = 5.328779582883797 02/23/2020 15:11:21 - INFO - main - Epoch: 79, global_step = 4080 02/23/2020 15:11:21 - INFO - main - Epoch: 79, loss = 0.0002234776814778646 02/23/2020 15:11:21 - INFO - main - best epoch: 3, result: 0.35294117647058826

Thanks, Shahar

shahard92 commented 4 years ago

Never mind, i found that you gave a link to a real data

melvintzw commented 4 years ago

Hi, I am experiencing some problems with running the model as well. Would greatly appreciate some help. I run the following command: python run_classifier.py --do_lower_case --do_train --train_batch_size 16 but soon the script seems to terminate prematurely without any error messages. The output is shown below. Does anyone know how to fix this problem?

(env) PS C:\Users\melvin\FYP\SemBERT> python run_classifier.py --do_lower_case --do_train --train_batch_size 16
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
02/25/2020 16:40:46 - INFO - __main__ -   device: cuda n_gpu: 1, distributed training: False, 16-bits training: False
02/25/2020 16:40:47 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at C:\Users\melvin\.pytorch_pretrained_bert\26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
{'contradiction': 0, 'entailment': 1, 'neutral': 2}
02/25/2020 16:40:47 - INFO - __main__ -   *** Example ***
02/25/2020 16:40:47 - INFO - __main__ -   guid: train-0
02/25/2020 16:40:47 - INFO - __main__ -   tokens: [CLS] conceptual ##ly cream ski ##mming has two basic dimensions - product and geography . [SEP] product and geography are what make cream ski ##mming work . [SEP]
02/25/2020 16:40:47 - INFO - __main__ -   input_ids: 101 17158 2135 6949 8301 25057 2038 2048 3937 9646 1011 4031 1998 10505 1012 102 4031 1998 10505 2024 2054 2191 6949 8301 25057 2147 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   label: neutral (id = 2)
02/25/2020 16:40:47 - INFO - __main__ -   *** Example ***
02/25/2020 16:40:47 - INFO - __main__ -   guid: train-1
02/25/2020 16:40:47 - INFO - __main__ -   tokens: [CLS] you know during the season and i guess at at your level uh you lose them to the next level if if they decide to recall the the parent team the braves decide to call to recall a guy from triple a then a double a guy goes up to replace him and a single a guy goes up to replace him [SEP] you lose the things to the following level if the people recall . [SEP]
02/25/2020 16:40:47 - INFO - __main__ -   input_ids: 101 2017 2113 2076 1996 2161 1998 1045 3984 2012 2012 2115 2504 7910 2017 4558 2068 2000 1996 2279 2504 2065 2065 2027 5630 2000 9131 1996 1996 6687 2136 1996 13980 5630 2000 2655 2000 9131 1037 3124 2013 6420 1037 2059 1037 3313 1037 3124 3632 2039 2000 5672 2032 1998 1037 2309 1037 3124 3632 2039 2000 5672 2032 102 2017 4558 1996 2477 2000 1996 2206 2504 2065 1996 2111 9131 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   label: entailment (id = 1)
02/25/2020 16:40:47 - INFO - __main__ -   *** Example ***
02/25/2020 16:40:47 - INFO - __main__ -   guid: train-2
02/25/2020 16:40:47 - INFO - __main__ -   tokens: [CLS] one of our number will carry out your instructions minute ##ly . [SEP] a member of my team will execute your orders with immense precision . [SEP]
02/25/2020 16:40:47 - INFO - __main__ -   input_ids: 101 2028 1997 2256 2193 2097 4287 2041 2115 8128 3371 2135 1012 102 1037 2266 1997 2026 2136 2097 15389 2115 4449 2007 14269 11718 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   label: entailment (id = 1)
02/25/2020 16:40:47 - INFO - __main__ -   *** Example ***
02/25/2020 16:40:47 - INFO - __main__ -   guid: train-3
02/25/2020 16:40:47 - INFO - __main__ -   tokens: [CLS] how do you know ? all this is their information again . [SEP] this information belongs to them . [SEP]
02/25/2020 16:40:47 - INFO - __main__ -   input_ids: 101 2129 2079 2017 2113 1029 2035 2023 2003 2037 2592 2153 1012 102 2023 2592 7460 2000 2068 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   label: entailment (id = 1)
02/25/2020 16:40:47 - INFO - __main__ -   *** Example ***
02/25/2020 16:40:47 - INFO - __main__ -   guid: train-4
02/25/2020 16:40:47 - INFO - __main__ -   tokens: [CLS] yeah i tell you what though if you go price some of those tennis shoes i can see why now you know they ' re getting up in the hundred dollar range [SEP] the tennis shoes have a range of prices . [SEP]
02/25/2020 16:40:47 - INFO - __main__ -   input_ids: 101 3398 1045 2425 2017 2054 2295 2065 2017 2175 3976 2070 1997 2216 5093 6007 1045 2064 2156 2339 2085 2017 2113 2027 1005 2128 2893 2039 1999 1996 3634 7922 2846 102 1996 5093 6007 2031 1037 2846 1997 7597 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/25/2020 16:40:47 - INFO - __main__ -   label: neutral (id = 2)
tokenizer vocab size:  23
02/25/2020 16:40:49 - INFO - pytorch_pretrained_bert.modeling -   loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at C:\Users\melvin\.pytorch_pretrained_bert\distributed_-1\9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
02/25/2020 16:40:49 - INFO - pytorch_pretrained_bert.modeling -   extracting archive file C:\Users\melvin\.pytorch_pretrained_bert\distributed_-1\9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba to temp dir C:\Users\melvin\AppData\Local\Temp\tmp_mc4ttgk
02/25/2020 16:40:52 - INFO - pytorch_pretrained_bert.modeling -   Model config {
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "type_vocab_size": 2,
  "vocab_size": 30522
}

02/25/2020 16:40:53 - INFO - pytorch_pretrained_bert.modeling -   Weights of BertForSequenceClassificationTag not initialized from pretrained model: ['cnn.char_cnn.weight', 'cnn.char_cnn.bias', 'tag_model.embed.tag_embeddings.weight', 'tag_model.embed.LayerNorm.weight', 'tag_model.embed.LayerNorm.bias', 'tag_model.fc.weight', 'tag_model.fc.bias', 'dense.weight', 'dense.bias', 'pool.weight', 'pool.bias', 'classifier.weight', 'classifier.bias']
02/25/2020 16:40:53 - INFO - pytorch_pretrained_bert.modeling -   Weights from pretrained model not used in BertForSequenceClassificationTag: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
02/25/2020 16:40:55 - INFO - __main__ -   ***** Running training *****
02/25/2020 16:40:55 - INFO - __main__ -     Num examples = 51
02/25/2020 16:40:55 - INFO - __main__ -     Batch size = 16
02/25/2020 16:40:55 - INFO - __main__ -     Num steps = 9
Epoch:   0%|                                                                                     | 0/3 [00:00<?, ?it/s]
(env) PS C:\Users\melvin\FYP\SemBERT>                                                            | 0/4 [00:00<?, ?it/s]

cooelf / SemBERT

Errors while trying running the model #6