deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.73k stars 247 forks source link

Can't train a language model #853

Closed ofrimasad closed 2 years ago

ofrimasad commented 2 years ago

Question Hey I am trying to train a language model called onlplab/alephbert-base(a Hebrew language model, closest to Roberta). But when i call trainer.train() I get an error:

Traceback (most recent call last):
  File ".../src/train/train.py", line 161, in <module>
    question_answering(run_name=opt.run_name,
  File ".../src/train/train.py", line 109, in question_answering
    trainer.train()
  File ".../lib/python3.8/site-packages/farm/train.py", line 300, in train
    logits = self.model.forward(**batch)
  File ".../lib/python3.8/site-packages/farm/modeling/adaptive_model.py", line 419, in forward
    sequence_output, pooled_output = self.forward_lm(**kwargs)
  File ".../lib/python3.8/site-packages/farm/modeling/adaptive_model.py", line 463, in forward_lm
    sequence_output, pooled_output = self.language_model(**kwargs, return_dict=False, output_all_encoded_layers=False)
  File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File ".../lib/python3.8/site-packages/farm/modeling/language_model.py", line 679, in forward
    output_tuple = self.model(
  File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File ".../lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 815, in forward
    encoder_outputs = self.encoder(
  File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File ".../lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 508, in forward
    layer_outputs = layer_module(
  File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File ".../lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 395, in forward
    self_attention_outputs = self.attention(
  File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File ".../lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 323, in forward
    self_outputs = self.self(
  File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File ".../lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 187, in forward
    mixed_query_layer = self.query(hidden_states)
  File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File ".../lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File ".../lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

Any idea why is that happening?

My batch size is small (not close to filling the GPU mem). I have tried LanguageModel.load(lang_model, language_model_class='Roberta') (since my model also does not use token_type_ids)

Thanks

Additional context farm version 0.8.0

Timoeller commented 2 years ago

Hey this error seems strange. I believe it is rather pytorch or huggingface transformers related than a problem within FARM.

Have you tried using only the CPU to let the code run?

Timoeller commented 2 years ago

Actually this post from pytorch says your cuda might be running out of memory, so you should just try to lower the batch size or max_seq length. See https://discuss.pytorch.org/t/cuda-error-cublas-status-not-initialized-when-calling-cublascreate-handle/125450/2

ofrimasad commented 2 years ago

hi @Timoeller. I have actually managed to narrow this down. My model expects batch['segment_ids'] to be all 0 (just like Roberta does). When I use deepset/roberta-base-squad2 model this is exactly what happens. but when using my model i get batch['segment_ids'] containing 1 as well. I can't find in the documentation any way to set all segment_ids to 0. I suspect it is looking for the model to be Roberta at some point of your code... Thanks

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.