explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.82k stars 4.37k forks source link

⚠ Aborting and saving the final best model. Encountered exception: RuntimeError('Invalid argument') RuntimeError: Invalid argument #13468

Closed Lance-Owen closed 4 months ago

Lance-Owen commented 5 months ago

I had a problem when I used the GPU provided by kaggle to train my Chinese information extraction model, I used the config file generated by the config file generation method of the spacy official website.Your help is greatly appreciated

Some of my environmental information is as follows, if you need to provide others, please leave a message and try your best to provide you

!nvidia-smi

 NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================|
|   0  Tesla P100-PCIE-16GB           Off | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0              26W / 250W |      0MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|==================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
!python -V
Python 3.10.13

!python -m spacy info

============================== Info about spaCy ==============================

spaCy version    3.7.4                         
Location         /opt/conda/lib/python3.10/site-packages/spacy
Platform         Linux-5.15.133+-x86_64-with-glibc2.31
Python version   3.10.13                       
Pipelines        zh_core_web_lg (3.7.0), en_core_web_sm (3.7.1), en_core_web_lg (3.7.1)

There are too many python packages for easy display, so we will provide them to you if necessary

Execute the command when an error occurs

!python -m spacy project run all

Misinformation in its entirety

ℹ Running workflow 'all'

================================== convert ==================================
ℹ Skipping 'convert': nothing changed

=================================== train ===================================
Running command: /opt/conda/bin/python -m spacy train configs/config.cfg --output training/bid/ --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy --gpu-id 0
ℹ Saving to output directory: training/bid
ℹ Using GPU: 0

=========================== Initializing pipeline ===========================
[2024-04-28 08:10:01,857] [INFO] Set up nlp object from config
[2024-04-28 08:10:01,902] [INFO] Pipeline: ['transformer', 'ner']
[2024-04-28 08:10:01,909] [INFO] Created vocabulary
[2024-04-28 08:10:01,910] [INFO] Finished initializing nlp object
[2024-04-28 08:10:19,274] [INFO] Initialized pipeline components: ['transformer', 'ner']
✔ Initialized pipeline

============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'ner']
ℹ Initial learn rate: 0.0
E    #       LOSS TRANS...  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  -------------  --------  ------  ------  ------  ------
⚠ Aborting and saving the final best model. Encountered exception:
RuntimeError('Invalid argument')
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.10/site-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/opt/conda/lib/python3.10/site-packages/spacy/cli/_util.py", line 87, in setup_cli
    command(prog_name=COMMAND)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 783, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 225, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/spacy/cli/train.py", line 54, in train_cli
    train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
  File "/opt/conda/lib/python3.10/site-packages/spacy/cli/train.py", line 84, in train
    train_nlp(nlp, output_path, use_gpu=use_gpu, stdout=sys.stdout, stderr=sys.stderr)
  File "/opt/conda/lib/python3.10/site-packages/spacy/training/loop.py", line 135, in train
    raise e
  File "/opt/conda/lib/python3.10/site-packages/spacy/training/loop.py", line 118, in train
    for batch, info, is_best_checkpoint in training_step_iterator:
  File "/opt/conda/lib/python3.10/site-packages/spacy/training/loop.py", line 220, in train_while_improving
    nlp.update(
  File "/opt/conda/lib/python3.10/site-packages/spacy/language.py", line 1193, in update
    proc.update(examples, sgd=None, losses=losses, **component_cfg[name])  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/spacy_transformers/pipeline_component.py", line 294, in update
    trf_full, bp_trf_full = self.model.begin_update(docs)
  File "/opt/conda/lib/python3.10/site-packages/thinc/model.py", line 328, in begin_update
    return self._func(self, X, is_train=True)
  File "/opt/conda/lib/python3.10/site-packages/spacy_transformers/layers/transformer_model.py", line 199, in forward
    model_output, bp_tensors = transformer(wordpieces, is_train)
  File "/opt/conda/lib/python3.10/site-packages/thinc/model.py", line 310, in __call__
    return self._func(self, X, is_train=is_train)
  File "/opt/conda/lib/python3.10/site-packages/thinc/layers/pytorchwrapper.py", line 225, in forward
    Ytorch, torch_backprop = model.shims[0](Xtorch, is_train)
  File "/opt/conda/lib/python3.10/site-packages/thinc/shims/pytorch.py", line 95, in __call__
    return self.begin_update(inputs)
  File "/opt/conda/lib/python3.10/site-packages/thinc/shims/pytorch.py", line 129, in begin_update
    output = self._model(*inputs.args, **inputs.kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 1013, in forward
    encoder_outputs = self.encoder(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 607, in forward
    layer_outputs = layer_module(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 497, in forward
    self_attention_outputs = self.attention(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 427, in forward
    self_outputs = self.self(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py", line 325, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: Invalid argument
svlandeg commented 4 months ago

Hi! Let me transfer this thread to the discussion forum, as we like to keep the issue tracker focused on bug reports.