输入两个句子导致索引越界

Daemon-ser commented 3 years ago

使用其他模型名字可以使用(代码只改了模型名，全部使用transfomer 的Auto类)，使用chinese-bert-wwm或者其他模型可以，但是使用guwenbert报cuda出错，在模型前馈计算时出错

yanqiangmiffy commented 3 years ago

同样的错误，应该是transformers版本或者torch版本不对应

Ethan-yt commented 3 years ago

可不可以把代码贴一下？

yanqiangmiffy commented 3 years ago

可不可以把代码贴一下？

https://github.com/huggingface/transformers/tree/master/examples/multiple-choice

这个问题之前遇到过，不过是NER的任务，当时通过改变transformers和torch的版本最后才解决的，不同的切换尝试罢不同版本，但是很痛苦很麻烦 https://github.com/z814081807/DeepNER/issues/1#issuecomment-757106404

Ethan-yt commented 3 years ago

看报错日志是索引越界，检查一下词表，训练数据。比如句子是否过长？如果数据没问题检查一下网络结构，AutoModel的输出是一个向量，如果是分类任务还需要接一层FFN。

Daemon-ser commented 3 years ago

重点是代码只改变模型名字，改成chinese-bert-wwm，或者其他中文模型都可以跑，改成ethanyt/guwenbert-base 或者large就都不行了，可能是 @yanqiangmiffy 他说的版本问题....

Ethan-yt commented 3 years ago

重点是代码只改变模型名字，改成chinese-bert-wwm，或者其他中文模型都可以跑，改成ethanyt/guwenbert-base 或者large就都不行了，可能是 @yanqiangmiffy 他说的版本问题....

通常CUDA报错可以通过CPU来debug。如果只用cpu跑同样的代码（删掉model.cuda()）会更容易发现哪里越界。

Daemon-ser commented 3 years ago

重点是代码只改变模型名字，改成chinese-bert-wwm，或者其他中文模型都可以跑，改成ethanyt/guwenbert-base 或者large就都不行了，可能是 @yanqiangmiffy 他说的版本问题....

通常CUDA报错可以通过CPU来debug。如果只用cpu跑同样的代码（删掉model.cuda()）会更容易发现哪里越界。

好的

yanqiangmiffy commented 3 years ago

看报错日志是索引越界，检查一下词表，训练数据。比如句子是否过长？如果数据没问题检查一下网络结构，AutoModel的输出是一个向量，如果是分类任务还需要接一层FFN。

对的，我一开始也是以为是数组索引越界或者GPU OOM，但是在两个任务中（NER和MRC）中，我把bs和max_len设置成较低值也会报错，在NER任务中我是用任何模型都会报错，在MRC中使用当前模型会报错。后来我通过配置transformer和torch来解决的

yanqiangmiffy commented 3 years ago

NER成功的环境

Package           Version
----------------- -------------------
certifi           2020.6.20
chardet           4.0.0
click             7.1.2
configparser      5.0.1
dataclasses       0.8
docker-pycreds    0.4.0
filelock          3.0.12
gitdb             4.0.5
GitPython         3.1.12
idna              2.10
joblib            1.0.0
numpy             1.19.5
pandas            1.1.5
Pillow            8.1.0
pip               20.2.4
promise           2.3
protobuf          3.14.0
psutil            5.8.0
pyltp             0.2.1
python-dateutil   2.8.1
pytorch-crf       0.7.2
pytz              2020.5
PyYAML            5.3.1
regex             2020.11.13
requests          2.25.1
sacremoses        0.0.43
scikit-learn      0.24.0
scipy             1.5.4
sentencepiece     0.1.94
sentry-sdk        0.19.5
setuptools        50.3.0.post20201006
shortuuid         1.0.1
six               1.15.0
smmap             3.0.4
subprocess32      3.5.4
threadpoolctl     2.1.0
tokenizers        0.7.0
torch             1.7.1
torchvision       0.8.2
tqdm              4.55.1
transformers      2.10.0
typing-extensions 3.7.4.3
urllib3           1.26.2
wandb             0.10.12
watchdog          1.0.2
wheel             0.35.1

Daemon-ser commented 3 years ago

看报错日志是索引越界，检查一下词表，训练数据。比如句子是否过长？如果数据没问题检查一下网络结构，AutoModel的输出是一个向量，如果是分类任务还需要接一层FFN。

对的，我一开始也是以为是数组索引越界或者GPU OOM，但是在两个任务中（NER和MRC）中，我把bs和max_len设置成较低值也会报错，在NER任务中我是用任何模型都会报错，在MRC中使用当前模型会报错。后来我通过配置transformer和torch来解决的

我目前是torch 1.7 + transfomers 3.4，我试试降级transformers

yanqiangmiffy commented 3 years ago

请问下作者当前使用的transformers的版本是多少，我之前Google很多关于这个bug，可能是API（比如tokenizer参数）变动了，以下为可能的原因：

1.GPU OOM 2.huggingface OOM 3.[max_seq_length(RuntimeError: cuda runtime error (59) : device-side assert triggered #97]](https://github.com/huggingface/transformers/issues/97) 4.API使用huggingface‘s transformers预训练自己模型时报：Assertion ‘srcIndex ＜ srcSelectDimSize‘ failed. 的解决办法

Daemon-ser commented 3 years ago

请问下作者当前使用的transformers的版本是多少，我之前Google很多关于这个bug，可能是API（比如tokenizer参数）变动了

的确，看报错应该是索引越界，很可能是分词器的问题，导致encode后index不同，可能transformers版本更新改了分词器，我试一试降级transformers

Daemon-ser commented 3 years ago

请问下作者当前使用的transformers的版本是多少，我之前Google很多关于这个bug，可能是API（比如tokenizer参数）变动了，以下为可能的原因：

1.GPU OOM 2.huggingface OOM 3.[max_seq_length(RuntimeError: cuda runtime error (59) : device-side assert triggered #97]](huggingface/transformers#97) 4.API使用huggingface‘s transformers预训练自己模型时报：Assertion ‘srcIndex ＜ srcSelectDimSize‘ failed. 的解决办法

多谢，降级为2.4后，出现同样的错误，我再细看下这几个解决办法

Ethan-yt commented 3 years ago

禁用cuda之后，查看报错信息：

  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module>
    main()
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train
    tr_loss += self.training_step(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step
    loss = self.compute_loss(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss
    outputs = model(**inputs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward
    return_dict=return_dict,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward
    past_key_values_length=past_key_values_length,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

重点关注这行：

  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

发现是token_type_embeddings出错。通过debug发现，输入数据的token_type_ids有0和1两种。

Roberta由于取消了BERT的next sentence prediction任务，token_type_id 只支持0。

Ethan-yt commented 3 years ago

解决方法：修改读数据相关代码，将token_type_id全部设置为0。

为什么其他模型可以使用：因为他们的robert是假的，底层还是bert。

关于token_type_id可以看一下以下参考资料： https://huggingface.co/transformers/model_doc/roberta.html#transformers.RobertaTokenizer.create_token_type_ids_from_sequences https://huggingface.co/transformers/glossary.html#token-type-ids https://github.com/huggingface/transformers/issues/1114

Daemon-ser commented 3 years ago

禁用cuda之后，查看报错信息：

  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module>
    main()
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train
    tr_loss += self.training_step(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step
    loss = self.compute_loss(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss
    outputs = model(**inputs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward
    return_dict=return_dict,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward
    past_key_values_length=past_key_values_length,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

重点关注这行：

  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

发现是token_type_embeddings出错。通过debug发现，输入数据的token_type_ids有0和1两种。

Roberta由于取消了BERT的next sentence prediction任务，token_type_id 只支持0。

感谢，懂了

Daemon-ser commented 3 years ago

解决方法：修改读数据相关代码，将token_type_id全部设置为0。

为什么其他模型可以使用：因为他们的robert是假的，底层还是bert。

关于token_type_id可以看一下以下参考资料： https://huggingface.co/transformers/model_doc/roberta.html#transformers.RobertaTokenizer.create_token_type_ids_from_sequences https://huggingface.co/transformers/glossary.html#token-type-ids huggingface/transformers#1114 是huggingface里的roberta模型都是用的nsp做的训练嘛，还是说是为了API兼容选择了训练roberta还用nsp，谢谢您啦！

jackhuntcn commented 3 years ago

感谢作者，楼上几位老哥都是在打海华技术组比赛吧？:)

Daemon-ser commented 3 years ago

感谢作者，楼上几位老哥都是在打海华技术组比赛吧？:)

是的，里面古文还挺多的，想试试这个bert能不能提升效果

Ethan-yt commented 3 years ago

Roberta本身就取消了nsp任务，但是还是保留了这个embedding，虽然全都是0，对整体的embedding没有任何影响。

Daemon-ser commented 3 years ago

Roberta本身就取消了nsp任务，但是还是保留了这个embedding，虽然全都是0，对整体的embedding没有任何影响。

嗯嗯，感谢

Ethan-yt commented 3 years ago

我没有参与这个比赛，如果你们发现有提升可以把对比结果分享一下，期待你们的好消息:)

yanqiangmiffy commented 3 years ago

解决方法：修改读数据相关代码，将token_type_id全部设置为0。

为什么其他模型可以使用：因为他们的robert是假的，底层还是bert。

关于token_type_id可以看一下以下参考资料： https://huggingface.co/transformers/model_doc/roberta.html#transformers.RobertaTokenizer.create_token_type_ids_from_sequences https://huggingface.co/transformers/glossary.html#token-type-ids huggingface/transformers#1114

感谢作者解答

Lirsakura commented 3 years ago

设置全0了依然没办法解决问题，请问版本应该是什么？

Lirsakura commented 3 years ago

禁用cuda之后，查看报错信息：

  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module>
    main()
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train
    tr_loss += self.training_step(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step
    loss = self.compute_loss(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss
    outputs = model(**inputs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward
    return_dict=return_dict,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward
    past_key_values_length=past_key_values_length,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

重点关注这行：

  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

发现是token_type_embeddings出错。通过debug发现，输入数据的token_type_ids有0和1两种。 Roberta由于取消了BERT的next sentence prediction任务，token_type_id 只支持0。

感谢，懂了

请问您解决问题了吗，我直接使用token_type_ids = torch.zeros_like(token_type_ids)似乎不行

Daemon-ser commented 3 years ago

禁用cuda之后，查看报错信息：

  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module>
    main()
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train
    tr_loss += self.training_step(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step
    loss = self.compute_loss(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss
    outputs = model(**inputs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward
    return_dict=return_dict,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward
    past_key_values_length=past_key_values_length,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

重点关注这行：

  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

发现是token_type_embeddings出错。通过debug发现，输入数据的token_type_ids有0和1两种。 Roberta由于取消了BERT的next sentence prediction任务，token_type_id 只支持0。

感谢，懂了

请问您解决问题了吗，我直接使用token_type_ids = torch.zeros_like(token_type_ids)似乎不行

我按照他说的，全部设为0后是可以的

Ethan-yt / guwenbert

输入两个句子导致索引越界 #11