Closed Daemon-ser closed 3 years ago
同样的错误,应该是transformers版本或者torch版本不对应
可不可以把代码贴一下?
可不可以把代码贴一下?
https://github.com/huggingface/transformers/tree/master/examples/multiple-choice
这个问题之前遇到过,不过是NER的任务,当时通过改变transformers和torch的版本最后才解决的,不同的切换尝试罢不同版本,但是很痛苦很麻烦 https://github.com/z814081807/DeepNER/issues/1#issuecomment-757106404
看报错日志是索引越界,检查一下词表,训练数据。比如句子是否过长?如果数据没问题检查一下网络结构,AutoModel的输出是一个向量,如果是分类任务还需要接一层FFN。
重点是代码只改变模型名字,改成chinese-bert-wwm,或者其他中文模型都可以跑,改成ethanyt/guwenbert-base 或者large就都不行了,可能是 @yanqiangmiffy 他说的版本问题....
重点是代码只改变模型名字,改成chinese-bert-wwm,或者其他中文模型都可以跑,改成ethanyt/guwenbert-base 或者large就都不行了,可能是 @yanqiangmiffy 他说的版本问题....
通常CUDA报错可以通过CPU来debug。如果只用cpu跑同样的代码(删掉model.cuda())会更容易发现哪里越界。
重点是代码只改变模型名字,改成chinese-bert-wwm,或者其他中文模型都可以跑,改成ethanyt/guwenbert-base 或者large就都不行了,可能是 @yanqiangmiffy 他说的版本问题....
通常CUDA报错可以通过CPU来debug。如果只用cpu跑同样的代码(删掉model.cuda())会更容易发现哪里越界。
好的
看报错日志是索引越界,检查一下词表,训练数据。比如句子是否过长?如果数据没问题检查一下网络结构,AutoModel的输出是一个向量,如果是分类任务还需要接一层FFN。
对的,我一开始也是以为是数组索引越界或者GPU OOM,但是在两个任务中(NER和MRC)中,我把bs和max_len设置成较低值也会报错,在NER任务中我是用任何模型都会报错,在MRC中使用当前模型会报错。后来我通过配置transformer和torch来解决的
NER成功的环境
Package Version
----------------- -------------------
certifi 2020.6.20
chardet 4.0.0
click 7.1.2
configparser 5.0.1
dataclasses 0.8
docker-pycreds 0.4.0
filelock 3.0.12
gitdb 4.0.5
GitPython 3.1.12
idna 2.10
joblib 1.0.0
numpy 1.19.5
pandas 1.1.5
Pillow 8.1.0
pip 20.2.4
promise 2.3
protobuf 3.14.0
psutil 5.8.0
pyltp 0.2.1
python-dateutil 2.8.1
pytorch-crf 0.7.2
pytz 2020.5
PyYAML 5.3.1
regex 2020.11.13
requests 2.25.1
sacremoses 0.0.43
scikit-learn 0.24.0
scipy 1.5.4
sentencepiece 0.1.94
sentry-sdk 0.19.5
setuptools 50.3.0.post20201006
shortuuid 1.0.1
six 1.15.0
smmap 3.0.4
subprocess32 3.5.4
threadpoolctl 2.1.0
tokenizers 0.7.0
torch 1.7.1
torchvision 0.8.2
tqdm 4.55.1
transformers 2.10.0
typing-extensions 3.7.4.3
urllib3 1.26.2
wandb 0.10.12
watchdog 1.0.2
wheel 0.35.1
看报错日志是索引越界,检查一下词表,训练数据。比如句子是否过长?如果数据没问题检查一下网络结构,AutoModel的输出是一个向量,如果是分类任务还需要接一层FFN。
对的,我一开始也是以为是数组索引越界或者GPU OOM,但是在两个任务中(NER和MRC)中,我把bs和max_len设置成较低值也会报错,在NER任务中我是用任何模型都会报错,在MRC中使用当前模型会报错。后来我通过配置transformer和torch来解决的
我目前是torch 1.7 + transfomers 3.4,我试试降级transformers
请问下作者当前使用的transformers的版本是多少,我之前Google很多关于这个bug,可能是API(比如tokenizer参数)变动了,以下为可能的原因:
1.GPU OOM 2.huggingface OOM 3.[max_seq_length(RuntimeError: cuda runtime error (59) : device-side assert triggered #97]](https://github.com/huggingface/transformers/issues/97) 4.API使用huggingface‘s transformers预训练自己模型时报:Assertion ‘srcIndex < srcSelectDimSize‘ failed. 的解决办法
请问下作者当前使用的transformers的版本是多少,我之前Google很多关于这个bug,可能是API(比如tokenizer参数)变动了
的确,看报错应该是索引越界,很可能是分词器的问题,导致encode后index不同,可能transformers版本更新改了分词器,我试一试降级transformers
请问下作者当前使用的transformers的版本是多少,我之前Google很多关于这个bug,可能是API(比如tokenizer参数)变动了,以下为可能的原因:
1.GPU OOM 2.huggingface OOM 3.[max_seq_length(RuntimeError: cuda runtime error (59) : device-side assert triggered #97]](huggingface/transformers#97) 4.API使用huggingface‘s transformers预训练自己模型时报:Assertion ‘srcIndex < srcSelectDimSize‘ failed. 的解决办法
多谢,降级为2.4后,出现同样的错误,我再细看下这几个解决办法
禁用cuda之后,查看报错信息:
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module>
main()
File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train
tr_loss += self.training_step(model, inputs)
File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step
loss = self.compute_loss(model, inputs)
File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss
outputs = model(**inputs)
File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward
return_dict=return_dict,
File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward
past_key_values_length=past_key_values_length,
File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
token_type_embeddings = self.token_type_embeddings(token_type_ids)
File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
重点关注这行:
File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward
token_type_embeddings = self.token_type_embeddings(token_type_ids)
发现是token_type_embeddings出错。通过debug发现,输入数据的token_type_ids有0和1两种。
Roberta由于取消了BERT的next sentence prediction任务,token_type_id 只支持0。
解决方法:修改读数据相关代码,将token_type_id全部设置为0。
为什么其他模型可以使用:因为他们的robert是假的,底层还是bert。
关于token_type_id可以看一下以下参考资料: https://huggingface.co/transformers/model_doc/roberta.html#transformers.RobertaTokenizer.create_token_type_ids_from_sequences https://huggingface.co/transformers/glossary.html#token-type-ids https://github.com/huggingface/transformers/issues/1114
禁用cuda之后,查看报错信息:
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module> main() File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train tr_loss += self.training_step(model, inputs) File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step loss = self.compute_loss(model, inputs) File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss outputs = model(**inputs) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward return_dict=return_dict, File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward past_key_values_length=past_key_values_length, File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward token_type_embeddings = self.token_type_embeddings(token_type_ids) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self
重点关注这行:
File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward token_type_embeddings = self.token_type_embeddings(token_type_ids)
发现是token_type_embeddings出错。通过debug发现,输入数据的token_type_ids有0和1两种。
Roberta由于取消了BERT的next sentence prediction任务,token_type_id 只支持0。
感谢,懂了
解决方法:修改读数据相关代码,将token_type_id全部设置为0。
为什么其他模型可以使用:因为他们的robert是假的,底层还是bert。
关于token_type_id可以看一下以下参考资料: https://huggingface.co/transformers/model_doc/roberta.html#transformers.RobertaTokenizer.create_token_type_ids_from_sequences https://huggingface.co/transformers/glossary.html#token-type-ids huggingface/transformers#1114 是huggingface里的roberta模型都是用的nsp做的训练嘛,还是说是为了API兼容选择了训练roberta还用nsp,谢谢您啦!
感谢作者,楼上几位老哥都是在打海华技术组比赛吧?:)
感谢作者,楼上几位老哥都是在打海华技术组比赛吧?:)
是的,里面古文还挺多的,想试试这个bert能不能提升效果
Roberta本身就取消了nsp任务,但是还是保留了这个embedding,虽然全都是0,对整体的embedding没有任何影响。
Roberta本身就取消了nsp任务,但是还是保留了这个embedding,虽然全都是0,对整体的embedding没有任何影响。
嗯嗯,感谢
我没有参与这个比赛,如果你们发现有提升可以把对比结果分享一下,期待你们的好消息:)
解决方法:修改读数据相关代码,将token_type_id全部设置为0。
为什么其他模型可以使用:因为他们的robert是假的,底层还是bert。
关于token_type_id可以看一下以下参考资料: https://huggingface.co/transformers/model_doc/roberta.html#transformers.RobertaTokenizer.create_token_type_ids_from_sequences https://huggingface.co/transformers/glossary.html#token-type-ids huggingface/transformers#1114
感谢作者解答
设置全0了依然没办法解决问题,请问版本应该是什么?
禁用cuda之后,查看报错信息:
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module> main() File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train tr_loss += self.training_step(model, inputs) File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step loss = self.compute_loss(model, inputs) File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss outputs = model(**inputs) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward return_dict=return_dict, File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward past_key_values_length=past_key_values_length, File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward token_type_embeddings = self.token_type_embeddings(token_type_ids) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self
重点关注这行:
File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward token_type_embeddings = self.token_type_embeddings(token_type_ids)
发现是token_type_embeddings出错。通过debug发现,输入数据的token_type_ids有0和1两种。 Roberta由于取消了BERT的next sentence prediction任务,token_type_id 只支持0。
感谢,懂了
请问您解决问题了吗,我直接使用token_type_ids = torch.zeros_like(token_type_ids)
似乎不行
禁用cuda之后,查看报错信息:
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 405, in <module> main() File "/Users/ethan/Projects/transformers/examples/multiple-choice/run_swag.py", line 367, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 996, in train tr_loss += self.training_step(model, inputs) File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1399, in training_step loss = self.compute_loss(model, inputs) File "/Users/ethan/Projects/transformers/src/transformers/trainer.py", line 1429, in compute_loss outputs = model(**inputs) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 1249, in forward return_dict=return_dict, File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 805, in forward past_key_values_length=past_key_values_length, File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward token_type_embeddings = self.token_type_embeddings(token_type_ids) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/Users/ethan/miniconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self
重点关注这行:
File "/Users/ethan/Projects/transformers/src/transformers/models/roberta/modeling_roberta.py", line 117, in forward token_type_embeddings = self.token_type_embeddings(token_type_ids)
发现是token_type_embeddings出错。通过debug发现,输入数据的token_type_ids有0和1两种。 Roberta由于取消了BERT的next sentence prediction任务,token_type_id 只支持0。
感谢,懂了
请问您解决问题了吗,我直接使用
token_type_ids = torch.zeros_like(token_type_ids)
似乎不行
我按照他说的,全部设为0后是可以的
使用其他模型名字可以使用(代码只改了模型名,全部使用transfomer 的Auto类),使用chinese-bert-wwm或者其他模型可以,但是使用guwenbert报cuda出错,在模型前馈计算时出错