PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.17k stars 2.95k forks source link

[Question]: 并行报错:ValueError: (InvalidArgument) The axis is expected to be in range of [0, 0), but got 0 #6697

Open haozaiiii opened 1 year ago

haozaiiii commented 1 year ago

请提出你的问题

并行训练时报错,求解答 Traceback (most recent call last): File "/home/pyh/ie2/PaddleNLP-release-2.5/model_zoo/uie/finetune.py", line 245, in main() File "/home/pyh/ie2/PaddleNLP-release-2.5/model_zoo/uie/finetune.py", line 184, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/root/anaconda3/envs/pybak/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 716, in train self._maybe_log_save_evaluate(tr_loss, model, epoch, ignore_keys_for_eval, inputs=inputs) File "/root/anaconda3/envs/pybak/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 810, in _maybe_log_save_evaluate tr_loss_scalar = self._nested_gather(tr_loss).mean().item() File "/root/anaconda3/envs/pybak/lib/python3.9/site-packages/paddlenlp/trainer/trainer.py", line 1954, in _nested_gather tensors = distributed_concat(tensors) File "/root/anaconda3/envs/pybak/lib/python3.9/site-packages/paddlenlp/trainer/utils/helper.py", line 45, in distributed_concat concat = paddle.concat(output_tensors, axis=0) File "/root/anaconda3/envs/pybak/lib/python3.9/site-packages/paddle/tensor/manipulation.py", line 1121, in concat return _C_ops.concat(input, axis) ValueError: (InvalidArgument) The axis is expected to be in range of [0, 0), but got 0 [Hint: Expected axis >= -rank && axis < rank == true, but received axis >= -rank && axis < rank:0 != true:1.] (at ../paddle/phi/infermeta/multiary.cc:961)

I0811 17:28:05.926939 101147 tcp_store.cc:273] receive shutdown event and so quit from MasterDaemon run loop LAUNCH INFO 2023-08-11 17:28:07,606 Pod failed

w5688414 commented 6 months ago

请问您的paddle和paddlenlp的版本是什么?怎么进行复现