[Question]: OSError: (External) CUDA error(700), an illegal memory access was encountered.

littlesmallrookie commented 1 year ago

请提出你的问题

在做nlp 文档抽取 finetune 过程中，在几个轮次过后，会自动中断训练，中断的时机不确定训练命令： python3.7 finetune.py --device cpu --logging_steps 5 --save_steps 100 --eval_steps 100 --seed 42 --model_name_or_path uie-x-base --output_dir ./checkpointtest1/model_best --train_path train/data/4/train.txt --dev_path train/data/4/dev.txt --max_seq_len 512 --per_device_train_batch_size 4 --per_device_eval_batch_size 2 --num_train_epochs 80 --learning_rate 1e-5 --do_train --do_eval --do_export --export_model_dir ./checkpointtest1/model_best --overwrite_output_dir --disable_tqdm True --metric_for_best_model eval_f1 --load_best_model_at_end True --save_total_limit 1

频繁出现，报以下错误： Exception in thread Thread-4: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 698, in convert_to_tensors tensor = as_tensor(value) File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/creation.py", line 546, in to_tensor return _to_tensor_non_static(data, dtype, place, stop_gradient) File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/creation.py", line 411, in _to_tensor_non_static stop_gradient=stop_gradient, OSError: (External) CUDA error(700), an illegal memory access was encountered. [Hint: Please search for the error code(700) on website (https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038) to get Nvidia's official solution and advice about CUDA Error.] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:259)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 218, in _thread_loop self._thread_done_event) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/fetcher.py", line 138, in fetch data = self.collate_fn(data) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/data/data_collator.py", line 199, in call return_attention_mask=self.return_attention_mask, File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 2619, in pad return BatchEncoding(batch_outputs, tensor_type=return_tensors) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 229, in init self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 708, in convert_to_tensors "Unable to create tensor, you should probably activate truncation and/or padding " ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

Traceback (most recent call last): File "finetune.py", line 177, in main() File "finetune.py", line 147, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/trainer/trainer.py", line 669, in train tr_loss_step = self.training_step(model, inputs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/trainer/trainer.py", line 1350, in training_step loss = self.compute_loss(model, inputs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/trainer/trainer.py", line 1312, in compute_loss outputs = model(inputs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 1012, in call return self.forward(*inputs, *kwargs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/ernie_layout/modeling.py", line 1174, in forward image=image, File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 1012, in call return self.forward(inputs, kwargs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/ernie_layout/modeling.py", line 775, in forward position_ids=visual_position_ids, File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/ernie_layout/modeling.py", line 645, in _calc_img_embeddings visual_embeddings = self.visual_act_fn(self.visual_proj(self.visual(image.astype(paddle.float32)))) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 1012, in call return self.forward(*inputs, kwargs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/ernie_layout/modeling.py", line 560, in forward features = self.backbone(images_input) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 1012, in call return self.forward(*inputs, *kwargs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/ernie_layout/visual_backbone.py", line 213, in forward y = block(y) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 1012, in call return self.forward(inputs, kwargs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/ernie_layout/visual_backbone.py", line 85, in forward short = self.short(inputs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 1012, in call return self.forward(*inputs, *kwargs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/ernie_layout/visual_backbone.py", line 42, in forward y = self._batch_norm(y) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 1012, in call return self.forward(inputs, **kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/nn.py", line 1375, in forward self._trainable_statistics, False) OSError: (External) CUDNN error(8), CUDNN_STATUS_EXECUTION_FAILED. [Hint: Please search for the error code(8) on website (https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnStatus_t) to get Nvidia's official solution and advice about CUDNN Error.] (at /paddle/paddle/phi/kernels/gpu/batch_norm_kernel.cu:1229)

lugimzzz commented 1 year ago

这个问题比较难定位，请搜索OSError: (External) CUDNN error(8), CUDNN_STATUS_EXECUTION_FAILED.报错的可能 [Hint: Please search for the error code(8) on website (https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnStatus_t) to get Nvidia's official solution and advice about CUDNN Error.] (at /paddle/paddle/phi/kernels/gpu/batch_norm_kernel.cu:1229)

sijunhe commented 1 year ago

启动命令的device为啥是cpu?

littlesmallrookie commented 1 year ago

是gpu 这里写错了

littlesmallrookie commented 1 year ago

我调小了batch_size train batch_size =2 eval batch_size = 1 训练时显示GPU利用率100%，评估时利用率60%左右，不一会儿就中断了，偶尔可以训练一段时间 :训练命令如下： python3.7 -u -m paddle.distributed.launch --gpus "0" finetune.py --device gpu --logging_steps 5 --save_steps 100 --eval_steps 100 --seed 42 --model_name_or_path uie-x-base --output_dir checkpoint-auto-4-2/model_best --train_path train/data/4/train.txt --dev_path train/data/4/dev.txt --max_seq_len 512 --per_device_train_batch_size 2 --per_device_eval_batch_size 1 --num_train_epochs 80 --learning_rate 1e-5 --do_train --do_eval --do_export --export_model_dir checkpoint-auto-4-2/model_best --overwrite_output_dir --disable_tqdm True --metric_for_best_model eval_f1 --load_best_model_at_end True --save_total_limit 1

当 train batch_size = 1 eval batch_size =1 时会立即训练中断报错如下： Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Error: /paddle/paddle/phi/kernels/gpu/bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one) failed. Input is expected to be within the interval [0, 1], but received nan. Traceback (most recent call last): File "finetune.py", line 177, in main() File "finetune.py", line 147, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/trainer/trainer.py", line 669, in train tr_loss_step = self.training_step(model, inputs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/trainer/trainer.py", line 1350, in training_step loss = self.compute_loss(model, inputs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/trainer/trainer.py", line 1312, in compute_loss outputs = model(inputs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 1012, in call return self.forward(*inputs, *kwargs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/ernie_layout/modeling.py", line 1174, in forward image=image, File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 1012, in call return self.forward(inputs, kwargs) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/ernie_layout/modeling.py", line 739, in forward visual_bbox = self._calc_visual_bbox(self.config["image_feature_pool_shape"], bbox, visual_shape) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/ernie_layout/modeling.py", line 687, in _calc_visual_bbox axis=-1, File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/manipulation.py", line 1839, in stack return _C_ops.stack(x, axis) OSError: (External) CUDA error(719), unspecified launch failure. [Hint: Please search for the error code(719) on website (https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038) to get Nvidia's official solution and advice about CUDA Error.] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:252)

Exception in thread Thread-2: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 698, in convert_to_tensors tensor = as_tensor(value) File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/creation.py", line 554, in to_tensor return _to_tensor_static(data, dtype, stop_gradient) File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/creation.py", line 472, in _to_tensor_static output = assign(data) File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/creation.py", line 1868, in assign value_name: values, File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/layer_helper.py", line 45, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/framework.py", line 4046, in append_op attrs=kwargs.get("attrs", None), File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/framework.py", line 3037, in init self.desc.infer_shape(self.block.desc) RuntimeError: (NotFound) The kernel assign_value is not registered. [Hint: Expected iter != kernels.end(), but received iter == kernels.end().] (at /paddle/paddle/phi/core/kernel_factory.cc:197) [operator < assign_value > error]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 218, in _thread_loop self._thread_done_event) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/fetcher.py", line 138, in fetch data = self.collate_fn(data) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/data/data_collator.py", line 199, in call return_attention_mask=self.return_attention_mask, File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 2619, in pad return BatchEncoding(batch_outputs, tensor_type=return_tensors) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 229, in init self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 708, in convert_to_tensors "Unable to create tensor, you should probably activate truncation and/or padding " ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

LAUNCH INFO 2023-08-07 10:28:10,123 Pod failed LAUNCH ERROR 2023-08-07 10:28:10,123 Container failed !!! Container rank 0 status failed cmd ['/usr/bin/python3.7', '-u', 'finetune.py', '--device', 'gpu', '--logging_steps', '5', '--save_steps', '100', '--eval_steps', '100', '--seed', '42', '--model_name_or_path', 'uie-x-base', '--output_dir', 'checkpoint-auto-4-2/model_best', '--train_path', 'train/data/4/train.txt', '--dev_path', 'train/data/4/dev.txt', '--max_seq_len', '512', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--num_train_epochs', '80', '--learning_rate', '1e-5', '--do_train', '--do_eval', '--do_export', '--export_model_dir', 'checkpoint-auto-4-2/model_best', '--overwrite_output_dir', '--disable_tqdm', 'True', '--metric_for_best_model', 'eval_f1', '--load_best_model_at_end', 'True', '--save_total_limit', '1'] code 1 log log/workerlog.0 env {'GREP_COLOR': '1;31', 'CUDNN_VERSION': '8.1.1.33', 'LC_ALL': 'en_US.UTF-8', 'LD_LIBRARY_PATH': '/usr/local/lib/python3.7/dist-packages/cv2/../../lib64:/usr/local/TensorRT-8.0.3.4/lib:/usr/local/cuda-11.2/targets/x86_64-linux/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64', 'LANG': 'en_US.UTF-8', 'HOSTNAME': 'f82f758c2aa9', 'OLDPWD': '/paddle/PaddleNLP-2.5.2/applications/information_extraction/document/train', 'WITH_GPU': 'ON', 'NVIDIA_VISIBLE_DEVICES': 'all', 'NCCL_VERSION': '2.8.4', 'GOPATH': '/root/gopath', 'PWD': '/paddle/PaddleNLP-2.5.2/applications/information_extraction/document', 'HOME': '/root', 'GOROOT': '/usr/local/go', 'CLICOLOR': '1', 'DEBIAN_FRONTEND': 'noninteractive', 'GREP_OPTIONS': '--color=auto', 'LIBRARY_PATH': '/usr/local/cuda/lib64/stubs', 'TERM': 'xterm', 'WITH_AVX': 'ON', 'CUDA_VERSION': '11.2.1', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'CUDA_VISIBLE_DEVICES': '0', 'SHLVL': '1', 'LANGUAGE': 'en_US.UTF-8', 'NVIDIA_REQUIRE_CUDA': 'cuda>=11.2 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441 driver>=450,driver<451', 'PATH': '/home/cmake-3.16.0-Linux-x8664/bin:/usr/local/gcc-8.2/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/go/bin:/root/gopath/bin', 'PS1': '\[\033[1;33m\]λ \[\033[1;37m\]\h \[\033[1;32m\]\w \[\033[0m\]', '': '/usr/bin/python3.7', 'CUSTOM_DEVICE_ROOT': '', 'OMP_NUM_THREADS': '1', 'QT_QPA_PLATFORM_PLUGIN_PATH': '/usr/local/lib/python3.7/dist-packages/cv2/qt/plugins', 'QT_QPA_FONTDIR': '/usr/local/lib/python3.7/dist-packages/cv2/qt/fonts', 'POD_NAME': 'zvyhkr', 'PADDLE_MASTER': '172.17.0.2:45402', 'PADDLE_GLOBAL_SIZE': '1', 'PADDLE_LOCAL_SIZE': '1', 'PADDLE_GLOBAL_RANK': '0', 'PADDLE_LOCAL_RANK': '0', 'PADDLE_NNODES': '1', 'PADDLE_TRAINER_ENDPOINTS': '172.17.0.2:45403', 'PADDLE_CURRENT_ENDPOINT': '172.17.0.2:45403', 'PADDLE_TRAINER_ID': '0', 'PADDLE_TRAINERS_NUM': '1', 'PADDLE_RANK_IN_NODE': '0', 'FLAGS_selected_gpus': '0'} LAUNCH INFO 2023-08-07 10:28:10,123 ------------------------- ERROR LOG DETAIL ------------------------- )

Exception in thread Thread-2: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 698, in convert_to_tensors tensor = as_tensor(value) File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/creation.py", line 554, in to_tensor return _to_tensor_static(data, dtype, stop_gradient) File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/creation.py", line 472, in _to_tensor_static output = assign(data) File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/creation.py", line 1868, in assign value_name: values, File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/layer_helper.py", line 45, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/framework.py", line 4046, in append_op attrs=kwargs.get("attrs", None), File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/framework.py", line 3037, in init self.desc.infer_shape(self.block.desc) RuntimeError: (NotFound) The kernel assign_value is not registered. [Hint: Expected iter != kernels.end(), but received iter == kernels.end().] (at /paddle/paddle/phi/core/kernel_factory.cc:197) [operator < assign_value > error]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/dataloader_iter.py", line 218, in _thread_loop self._thread_done_event) File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dataloader/fetcher.py", line 138, in fetch data = self.collate_fn(data) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/data/data_collator.py", line 199, in call return_attention_mask=self.return_attention_mask, File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 2619, in pad return BatchEncoding(batch_outputs, tensor_type=return_tensors) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 229, in init self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis) File "/usr/local/lib/python3.7/dist-packages/paddlenlp-2.5.2.post0-py3.7.egg/paddlenlp/transformers/tokenizer_utils_base.py", line 708, in convert_to_tensors "Unable to create tensor, you should probably activate truncation and/or padding " ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

w5688414 commented 4 months ago

请问您的paddle和paddle以及cuda版本是什么？我看报错是cuda kernel的问题：

RuntimeError: (NotFound) The kernel assign_value is not registered.

然后如果数据是非官方的话，检查一下数据有没有超长或者超短等问题。

PaddlePaddle / PaddleNLP

[Question]: OSError: (External) CUDA error(700), an illegal memory access was encountered. #6609

请提出你的问题