It keeps trying to use CUDA despite --with_cuda False option

Hello,

I have tried to run bert with _--withcuda False, but the model keeps running "forward" function on cuda. These are my command line and the error message I got.

_bert -c corpus.small -v vocab.small -o bert.model --withcuda False -e 5

Loading Vocab vocab.small Vocab Size: 262 Loading Train Dataset corpus.small Loading Dataset: 113it [00:00, 560232.09it/s] Loading Test Dataset None Creating Dataloader Building BERT model Creating BERT Trainer Total Parameters: 6453768 Training Start EP_train:0: 0%|| 0/2 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/yuni/anaconda3/envs/py3/bin/bert", line 8, in sys.exit(train()) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/bert_pytorch/main.py", line 67, in train trainer.train(epoch) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/bert_pytorch/trainer/pretrain.py", line 69, in train self.iteration(epoch, self.train_data) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/bert_pytorch/trainer/pretrain.py", line 102, in iteration next_sent_output, mask_lm_output = self.model.forward(data["bert_input"], data["segment_label"]) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/bert_pytorch/model/language_model.py", line 24, in forward x = self.bert(x, segment_label) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/bert_pytorch/model/bert.py", line 46, in forward x = transformer.forward(x, mask) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/bert_pytorch/model/transformer.py", line 29, in forward x = self.input_sublayer(x, lambda _x: self.attention.forward(_x, _x, _x, mask=mask)) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/bert_pytorch/model/utils/sublayer.py", line 18, in forward return x + self.dropout(sublayer(self.norm(x))) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/bert_pytorch/model/transformer.py", line 29, in x = self.input_sublayer(x, lambda _x: self.attention.forward(_x, _x, _x, mask=mask)) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/bert_pytorch/model/attention/multi_head.py", line 32, in forward x, attn = self.attention(query, key, value, mask=mask, dropout=self.dropout) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/yuni/anaconda3/envs/py3/lib/python3.6/site-packages/bert_pytorch/model/attention/single.py", line 25, in forward return torch.matmul(p_attn, value), p_attn RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 1.95 GiB total capacity; 309.18 MiB already allocated; 125.62 MiB free; 312.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

codertimo / BERT-pytorch

It keeps trying to use CUDA despite --with_cuda False option #94