PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.16k stars 2.94k forks source link

bvar is busy at sampling for 2 seconds #2701

Closed pengfan7758258 closed 1 year ago

pengfan7758258 commented 2 years ago

版本、环境信息 1)PaddleNLP和PaddlePaddle版本:paddlenlp 2.3.3,paddlepaddle-gpu 2.3.0 2)系统环境:Linux-ubuntu,python 3.8.13

是在uie上做的finetune 运行的命令是复制的官网给的例子

python -u -m paddle.distributed.launch --gpus "0" finetune.py \
  --train_path ./data/train.txt \
  --dev_path ./data/dev.txt \
  --save_dir ./checkpoint \
  --learning_rate 1e-5 \
  --batch_size 16 \
  --max_seq_len 512 \
  --num_epochs 100 \
  --model uie-base \
  --seed 1000 \
  --logging_steps 10 \
  --valid_steps 100 \
  --device gpu

运行的log

log
LemonNoel commented 2 years ago

当前目录下的log/workerlog.0文件里有其他报错信息吗?

没有的话,可以跑下下边命令看看paddle是否安装成功。

import paddle
paddle.utils.run_check()

或者把paddlenlp升级到最新版本pip install paddlenlp==2.3.4

pengfan7758258 commented 2 years ago

@LemonNoel ,显示如下

log
pengfan7758258 commented 2 years ago

@LemonNoel 补充一下就是我前面在微调的时候,被指定训练的gpu现存已经占用了

log
LemonNoel commented 2 years ago

@LemonNoel ,显示如下 log

看起来NCCL安装有问题,可以试下用conda来安装paddlepaddle-gpu,然后再测下看看是否在多卡上安装成功了。

pengfan7758258 commented 2 years ago

@LemonNoel 这个NCCL是否需要单独安装,我重新创建了conda的虚拟环境也重新安装了paddlepaddle-gpu也是同样的错误

LemonNoel commented 2 years ago

是的,NCCL需要重新安装。可以参考下Nvidia的官方文档 https://docs.nvidia.com/deeplearning/nccl/install-guide/index.html ,或者试试用conda安装 https://libraries.io/conda/nccl

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。