Ethan-yt / guwenbert

GuwenBERT: 古文预训练语言模型(古文BERT) A Pre-trained Language Model for Classical Chinese (Literary Chinese)
Apache License 2.0
503 stars 38 forks source link

输入两个句子导致索引越界 #11

Closed Daemon-ser closed 3 years ago

Daemon-ser commented 3 years ago

使用其他模型名字可以使用(代码只改了模型名,全部使用transfomer 的Auto类),使用chinese-bert-wwm或者其他模型可以,但是使用guwenbert报cuda出错,在模型前馈计算时出错 image

yanqiangmiffy commented 3 years ago

同样的错误,应该是transformers版本或者torch版本不对应

Ethan-yt commented 3 years ago

可不可以把代码贴一下?

yanqiangmiffy commented 3 years ago

可不可以把代码贴一下?

https://github.com/huggingface/transformers/tree/master/examples/multiple-choice

这个问题之前遇到过,不过是NER的任务,当时通过改变transformers和torch的版本最后才解决的,不同的切换尝试罢不同版本,但是很痛苦很麻烦 https://github.com/z814081807/DeepNER/issues/1#issuecomment-757106404

Ethan-yt commented 3 years ago

看报错日志是索引越界,检查一下词表,训练数据。比如句子是否过长?如果数据没问题检查一下网络结构,AutoModel的输出是一个向量,如果是分类任务还需要接一层FFN。

Daemon-ser commented 3 years ago

重点是代码只改变模型名字,改成chinese-bert-wwm,或者其他中文模型都可以跑,改成ethanyt/guwenbert-base 或者large就都不行了,可能是 @yanqiangmiffy 他说的版本问题....

Ethan-yt commented 3 years ago

重点是代码只改变模型名字,改成chinese-bert-wwm,或者其他中文模型都可以跑,改成ethanyt/guwenbert-base 或者large就都不行了,可能是 @yanqiangmiffy 他说的版本问题....

通常CUDA报错可以通过CPU来debug。如果只用cpu跑同样的代码(删掉model.cuda())会更容易发现哪里越界。

Daemon-ser commented 3 years ago

重点是代码只改变模型名字,改成chinese-bert-wwm,或者其他中文模型都可以跑,改成ethanyt/guwenbert-base 或者large就都不行了,可能是 @yanqiangmiffy 他说的版本问题....

通常CUDA报错可以通过CPU来debug。如果只用cpu跑同样的代码(删掉model.cuda())会更容易发现哪里越界。

好的

yanqiangmiffy commented 3 years ago

看报错日志是索引越界,检查一下词表,训练数据。比如句子是否过长?如果数据没问题检查一下网络结构,AutoModel的输出是一个向量,如果是分类任务还需要接一层FFN。

对的,我一开始也是以为是数组索引越界或者GPU OOM,但是在两个任务中(NER和MRC)中,我把bs和max_len设置成较低值也会报错,在NER任务中我是用任何模型都会报错,在MRC中使用当前模型会报错。后来我通过配置transformer和torch来解决的

yanqiangmiffy commented 3 years ago

NER成功的环境

Package           Version
----------------- -------------------
certifi           2020.6.20
chardet           4.0.0
click             7.1.2
configparser      5.0.1
dataclasses       0.8
docker-pycreds    0.4.0
filelock          3.0.12
gitdb             4.0.5
GitPython         3.1.12
idna              2.10
joblib            1.0.0
numpy             1.19.5
pandas            1.1.5
Pillow            8.1.0
pip               20.2.4
promise           2.3
protobuf          3.14.0
psutil            5.8.0
pyltp             0.2.1
python-dateutil   2.8.1
pytorch-crf       0.7.2
pytz              2020.5
PyYAML            5.3.1
regex             2020.11.13
requests          2.25.1
sacremoses        0.0.43
scikit-learn      0.24.0
scipy             1.5.4
sentencepiece     0.1.94
sentry-sdk        0.19.5
setuptools        50.3.0.post20201006
shortuuid         1.0.1
six               1.15.0
smmap             3.0.4
subprocess32      3.5.4
threadpoolctl     2.1.0
tokenizers        0.7.0
torch             1.7.1
torchvision       0.8.2
tqdm              4.55.1
transformers      2.10.0
typing-extensions 3.7.4.3
urllib3           1.26.2
wandb             0.10.12
watchdog          1.0.2
wheel             0.35.1
Daemon-ser commented 3 years ago

看报错日志是索引越界,检查一下词表,训练数据。比如句子是否过长?如果数据没问题检查一下网络结构,AutoModel的输出是一个向量,如果是分类任务还需要接一层FFN。

对的,我一开始也是以为是数组索引越界或者GPU OOM,但是在两个任务中(NER和MRC)中,我把bs和max_len设置成较低值也会报错,在NER任务中我是用任何模型都会报错,在MRC中使用当前模型会报错。后来我通过配置transformer和torch来解决的

我目前是torch 1.7 + transfomers 3.4,我试试降级transformers

yanqiangmiffy commented 3 years ago

请问下作者当前使用的transformers的版本是多少,我之前Google很多关于这个bug,可能是API(比如tokenizer参数)变动了,以下为可能的原因:

1.GPU OOM 2.huggingface OOM 3.[max_seq_length(RuntimeError: cuda runtime error (59) : device-side assert triggered #97]](https://github.com/huggingface/transformers/issues/97) 4.API使用huggingface‘s transformers预训练自己模型时报:Assertion ‘srcIndex < srcSelectDimSize‘ failed. 的解决办法

Daemon-ser commented 3 years ago

请问下作者当前使用的transformers的版本是多少,我之前Google很多关于这个bug,可能是API(比如tokenizer参数)变动了

的确,看报错应该是索引越界,很可能是分词器的问题,导致encode后index不同,可能transformers版本更新改了分词器,我试一试降级transformers

Daemon-ser commented 3 years ago

请问下作者当前使用的transformers的版本是多少,我之前Google很多关于这个bug,可能是API(比如tokenizer参数)变动了,以下为可能的原因:

1.GPU OOM 2.huggingface OOM 3.[max_seq_length(RuntimeError: cuda runtime error (59) : device-side assert triggered #97]](huggingface/transformers#97) 4.API使用huggingface‘s transformers预训练自己模型时报:Assertion ‘srcIndex < srcSelectDimSize‘ failed. 的解决办法

多谢,降级为2.4后,出现同样的错误,我再细看下这几个解决办法

Ethan-yt commented 3 years ago

禁用cuda之后,查看报错信息:

  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module>
    main()
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train
    tr_loss += self.training_step(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step
    loss = self.compute_loss(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss
    outputs = model(**inputs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward
    return_dict=return_dict,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward
    past_key_values_length=past_key_values_length,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

重点关注这行:

  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

发现是token_type_embeddings出错。通过debug发现,输入数据的token_type_ids有0和1两种。

Roberta由于取消了BERT的next sentence prediction任务,token_type_id 只支持0。

Ethan-yt commented 3 years ago

解决方法:修改读数据相关代码,将token_type_id全部设置为0。

为什么其他模型可以使用:因为他们的robert是假的,底层还是bert。

关于token_type_id可以看一下以下参考资料: https://huggingface.co/transformers/model_doc/roberta.html#transformers.RobertaTokenizer.create_token_type_ids_from_sequences https://huggingface.co/transformers/glossary.html#token-type-ids https://github.com/huggingface/transformers/issues/1114

Daemon-ser commented 3 years ago

禁用cuda之后,查看报错信息:

  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module>
    main()
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train
    tr_loss += self.training_step(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step
    loss = self.compute_loss(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss
    outputs = model(**inputs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward
    return_dict=return_dict,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward
    past_key_values_length=past_key_values_length,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

重点关注这行:

  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

发现是token_type_embeddings出错。通过debug发现,输入数据的token_type_ids有0和1两种。

Roberta由于取消了BERT的next sentence prediction任务,token_type_id 只支持0。

感谢,懂了

Daemon-ser commented 3 years ago

解决方法:修改读数据相关代码,将token_type_id全部设置为0。

为什么其他模型可以使用:因为他们的robert是假的,底层还是bert。

关于token_type_id可以看一下以下参考资料: https://huggingface.co/transformers/model_doc/roberta.html#transformers.RobertaTokenizer.create_token_type_ids_from_sequences https://huggingface.co/transformers/glossary.html#token-type-ids huggingface/transformers#1114 是huggingface里的roberta模型都是用的nsp做的训练嘛,还是说是为了API兼容选择了训练roberta还用nsp,谢谢您啦!

jackhuntcn commented 3 years ago

感谢作者,楼上几位老哥都是在打海华技术组比赛吧?:)

Daemon-ser commented 3 years ago

感谢作者,楼上几位老哥都是在打海华技术组比赛吧?:)

是的,里面古文还挺多的,想试试这个bert能不能提升效果

Ethan-yt commented 3 years ago

Roberta本身就取消了nsp任务,但是还是保留了这个embedding,虽然全都是0,对整体的embedding没有任何影响。

Daemon-ser commented 3 years ago

Roberta本身就取消了nsp任务,但是还是保留了这个embedding,虽然全都是0,对整体的embedding没有任何影响。

嗯嗯,感谢

Ethan-yt commented 3 years ago

我没有参与这个比赛,如果你们发现有提升可以把对比结果分享一下,期待你们的好消息:)

yanqiangmiffy commented 3 years ago

解决方法:修改读数据相关代码,将token_type_id全部设置为0。

为什么其他模型可以使用:因为他们的robert是假的,底层还是bert。

关于token_type_id可以看一下以下参考资料: https://huggingface.co/transformers/model_doc/roberta.html#transformers.RobertaTokenizer.create_token_type_ids_from_sequences https://huggingface.co/transformers/glossary.html#token-type-ids huggingface/transformers#1114

感谢作者解答

Lirsakura commented 3 years ago

设置全0了依然没办法解决问题,请问版本应该是什么?

Lirsakura commented 3 years ago

禁用cuda之后,查看报错信息:

  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module>
    main()
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train
    tr_loss += self.training_step(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step
    loss = self.compute_loss(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss
    outputs = model(**inputs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward
    return_dict=return_dict,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward
    past_key_values_length=past_key_values_length,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

重点关注这行:

  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

发现是token_type_embeddings出错。通过debug发现,输入数据的token_type_ids有0和1两种。 Roberta由于取消了BERT的next sentence prediction任务,token_type_id 只支持0。

感谢,懂了

请问您解决问题了吗,我直接使用token_type_ids = torch.zeros_like(token_type_ids)似乎不行

Daemon-ser commented 3 years ago

禁用cuda之后,查看报错信息:

  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module>
    main()
  File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train
    tr_loss += self.training_step(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step
    loss = self.compute_loss(model, inputs)
  File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss
    outputs = model(**inputs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward
    return_dict=return_dict,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward
    past_key_values_length=past_key_values_length,
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

重点关注这行:

  File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

发现是token_type_embeddings出错。通过debug发现,输入数据的token_type_ids有0和1两种。 Roberta由于取消了BERT的next sentence prediction任务,token_type_id 只支持0。

感谢,懂了

请问您解决问题了吗,我直接使用token_type_ids = torch.zeros_like(token_type_ids)似乎不行

我按照他说的,全部设为0后是可以的