报错如下:
0%| | 0/420 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
Traceback (most recent call last):
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 748, in convert_to_tensors
tensor = as_tensor(value)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 720, in as_tensor
return torch.tensor(value)
ValueError: expected sequence of length 353 at dim 1 (got 563)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/train/codeshell/finetune/finetune.py", line 220, in
train()
File "/root/train/codeshell/finetune/finetune.py", line 214, in train
trainer.train()
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train
return inner_training_loop(
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/trainer.py", line 1870, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/accelerate/data_loader.py", line 448, in iter
current_batch = next(dataloader_iter)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/trainer_utils.py", line 737, in call
return self.data_collator(features)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/data/data_collator.py", line 249, in call
batch = self.tokenizer.pad(
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3303, in pad
return BatchEncoding(batch_outputs, tensor_type=return_tensors)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 223, in init
self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 764, in convert_to_tensors
raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (labels in this case) have excessive nesting (inputs type list where type int is expected).
用run_finetune.sh脚本跑lora,稍微修改了脚本:
报错如下: 0%| | 0/420 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the
__call__
method is faster than using a method to encode the text followed by a call to thepad
method to get a padded encoding. Traceback (most recent call last): File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 748, in convert_to_tensors tensor = as_tensor(value) File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 720, in as_tensor return torch.tensor(value) ValueError: expected sequence of length 353 at dim 1 (got 563)The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/root/train/codeshell/finetune/finetune.py", line 220, in
train()
File "/root/train/codeshell/finetune/finetune.py", line 214, in train
trainer.train()
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train
return inner_training_loop(
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/trainer.py", line 1870, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/accelerate/data_loader.py", line 448, in iter
current_batch = next(dataloader_iter)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/trainer_utils.py", line 737, in call
return self.data_collator(features)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/data/data_collator.py", line 249, in call
batch = self.tokenizer.pad(
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3303, in pad
return BatchEncoding(batch_outputs, tensor_type=return_tensors)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 223, in init
self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
File "/root/miniconda3/envs/codeshell/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 764, in convert_to_tensors
raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (
labels
in this case) have excessive nesting (inputs typelist
where typeint
is expected).transformers版本是4.34.0,一直搜不到解决方法,求问