RUCAIBox / TextBox

TextBox 2.0 is a text generation library with pre-trained language models
https://github.com/RUCAIBox/TextBox
MIT License
1.08k stars 117 forks source link

[🐛BUG]我在使用mBART模型和WMT19zh-en时碰到问题。 #346

Open 01vanilla opened 1 year ago

01vanilla commented 1 year ago

描述这个 bug 我在使用mBART模型和WMT19zh-en时碰到以下问题。

如何复现 run_textbox.py --model=mBART --model_path=facebook/mbart-large-cc25 --dataset=wmt19-zh-en --src_lang=zh_CN --tgt_lang=en_XX

日志 23 Apr 00:43 INFO Pretrain type: pretrain disabled

:1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: 'int' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: 'str' object is not callable; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? :1: SyntaxWarning: list indices must be integers or slices, not tuple; perhaps you missed a comma? Token indices sequence length is longer than the specified maximum sequence length for this model (1776 > 1024). Running this sequence through the model will result in indexing errors Traceback (most recent call last): File "run_textbox.py", line 15, in run_textbox(model=args.model, dataset=args.dataset, config_file_list=args.config_files, config_dict={}) File "/hy-tmp/TextBox/textbox/quick_start/quick_start.py", line 20, in run_textbox experiment = Experiment(model, dataset, config_file_list, config_dict) File "/hy-tmp/TextBox/textbox/quick_start/experiment.py", line 56, in __init__ self._init_data(self.get_config(), self.accelerator) File "/hy-tmp/TextBox/textbox/quick_start/experiment.py", line 82, in _init_data train_data, valid_data, test_data = data_preparation(config, tokenizer) File "/hy-tmp/TextBox/textbox/data/utils.py", line 24, in data_preparation train_dataset.tokenize(tokenizer) File "/hy-tmp/TextBox/textbox/data/abstract_dataset.py", line 120, in tokenize ids = tokenizer( File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2538, in __call__ encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs) File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2624, in _call_one return self.batch_encode_plus( File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2815, in batch_encode_plus return self._batch_encode_plus( File "/usr/local/miniconda3/envs/TextBox/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 428, in _batch_encode_plus encodings = self._tokenizer.encode_batch( TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] 其中,我使用的transformers版本为4.28.1,torch版本为2.0.0+cu117
StevenTang1998 commented 1 year ago

你可以临时注释 https://github.com/RUCAIBox/TextBox/blob/2.0.0/textbox/data/misc.py 中的27~34行,我们之后会尽快修复

StevenTang1998 commented 1 year ago

如果有问题欢迎继续提问