RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when running mT5

I have been trying to run mT5 for the past 2 days and constantly running into weird CUDA errors one after the other.Can anyone please help me with this? Is it because of memory or any other issue?

Error 1: The most common error face is the cublas error: Error stack trace below:

/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/transformers/tokenization_utils_base.py:3260: FutureWarning: `prepare_seq2seq_batch` is deprecated and will be removed in version 5 of 🤗 Transformers. Use the regular `__call__` method to prepare your inputs and the tokenizer under the `with_target_tokenizer` context manager to prepare your targets. See the documentation of your specific tokenizer for more details
  FutureWarning,
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 183436/183436 [04:59<00:00, 613.16it/s]
Using Adafactor for T5
Epoch 1 of 3:   0%|                                                                                                                                   | 0/3 [00:00<?, ?it/s/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/transformers/optimization.py:562: UserWarning: This overload of add_ is deprecated:859 [00:00<?, ?it/s]
        add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
        add_(Tensor other, *, Number alpha) (Triggered internally at  /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/utils/python_arg_parser.cpp:766.)
  exp_avg_sq_row.mul_(beta2t).add_(1.0 - beta2t, update.mean(dim=-1))
Epochs 0/3. Running Loss:   12.3577:   0%|▍                                                                                           | 211/45859 [02:09<7:46:35,  1.63it/s]
Epoch 1 of 3:   0%|                                                                                                                                   | 0/3 [02:09<?, ?it/s]
Traceback (most recent call last):
  File "pretrain.py", line 30, in <module>
    model.train_model(train_df) #, eval_data=eval_df)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/simpletransformers/t5/t5_model.py", line 230, in train_model
    **kwargs,
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/simpletransformers/t5/t5_model.py", line 520, in train
    outputs = model(**inputs)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/transformers/models/t5/modeling_t5.py", line 1605, in forward
    return_dict=return_dict,
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/transformers/models/t5/modeling_t5.py", line 996, in forward
    output_attentions=output_attentions,
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/transformers/models/t5/modeling_t5.py", line 689, in forward
    hidden_states = self.layer[-1](hidden_states)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/transformers/models/t5/modeling_t5.py", line 300, in forward
    forwarded_states = self.DenseReluDense(forwarded_states)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/transformers/models/t5/modeling_t5.py", line 279, in forward
    hidden_states = self.wo(hidden_states)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Error 2 : Illegal Memory Access

Error stack trace below:

Traceback (most recent call last):
  File "pretrain.py", line 30, in <module>
    model.train_model(train_df) #, eval_data=eval_df)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/simpletransformers/t5/t5_model.py", line 230, in train_model
    **kwargs,
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/simpletransformers/t5/t5_model.py", line 513, in train
    inputs = self._get_inputs_dict(batch)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/simpletransformers/t5/t5_model.py", line 1125, in _get_inputs_dict
    batch = tuple(t.to(self.device) for t in batch)
  File "/home/ms/17CS72P02/anaconda3/envs/argfuse/lib/python3.7/site-packages/simpletransformers/t5/t5_model.py", line 1125, in <genexpr>
    batch = tuple(t.to(self.device) for t in batch)
RuntimeError: CUDA error: an illegal memory access was encountered

Below are my machine specs:

Ubuntu 16.04
Python=3.7
Pytorch=1.6
Cudatoolkit = 10.1

The following are my model arguments:

model_args = {
"use_multiprocessing" : False,
"fp16": False,
"overwrite_output_dir" : True,
"max_seq_length" : 98,
"train_batch_size" : 4,
"eval_batch_size" : 4,
"num_train_epochs" : 3,
}

Am I missing any argument?

ThilinaRajapakse / simpletransformers

RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when running mT5 #1166