安装的时候报错The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.

liangshu-code commented 9 months ago

问题描述 Issue Description

运行后paddle.utils.run_check()报错：

import paddle paddle.utils.run_check() Running verify PaddlePaddle program ... I0726 14:31:17.682049 212581 interpretercore.cc:237] New Executor is Running. W0726 14:31:17.682821 212581 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0726 14:31:17.682865 212581 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.7 W0726 14:31:17.839526 212581 gpu_resources.cc:149] device: 0, cuDNN Version: 8.4. I0726 14:31:27.508836 212581 interpreter_util.cc:518] Standalone Executor is Used. PaddlePaddle works well on 1 GPU. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

版本&环境信息 Version & Environment Information

Paddle version: 2.5.0 Paddle With CUDA: True

OS: ubuntu 20.04 GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: N/A CMake version: N/A Libc version: glibc 2.31 Python version: 3.8.0

CUDA version: 11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0 cuDNN version: N/A Nvidia driver version: 535.54.03 Nvidia driver List: GPU 0: Quadro P5000

YanhuiDua commented 9 months ago

你好，这个是warning，安装没有问题，使用的话遇到问题的话可以再提问，参考：https://github.com/PaddlePaddle/Paddle/issues/54713

chuwang9964 commented 9 months ago

我也遇到这个问题,在执行训练的时候停止了

YanhuiDua commented 9 months ago

我也遇到这个问题,在执行训练的时候停止了

你好，请问训练停止报什么错误呢？

GM5GM5 commented 9 months ago

开始训练到这一步就直接停止了。 W0816 11:37:29.171767 14600 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0816 11:37:29.172734 14600 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.2 W0816 11:37:29.177718 14600 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. [2023/08/16 11:37:29] ppocr INFO: train dataloader has 18 iters [2023/08/16 11:37:29] ppocr INFO: valid dataloader has 48 iters [2023/08/16 11:37:29] ppocr INFO: load pretrain successful from ./pretrain_models/ch_ppocr_server_v2.0_det_train/best_accuracy [2023/08/16 11:37:29] ppocr INFO: During the training process, after the 3000th iteration, an evaluation is run every 2000 iterations

YanhuiDua commented 9 months ago

开始训练到这一步就直接停止了。 W0816 11:37:29.171767 14600 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0816 11:37:29.172734 14600 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.2 W0816 11:37:29.177718 14600 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. [2023/08/16 11:37:29] ppocr INFO: train dataloader has 18 iters [2023/08/16 11:37:29] ppocr INFO: valid dataloader has 48 iters [2023/08/16 11:37:29] ppocr INFO: load pretrain successful from ./pretrain_models/ch_ppocr_server_v2.0_det_train/best_accuracy [2023/08/16 11:37:29] ppocr INFO: During the training process, after the 3000th iteration, an evaluation is run every 2000 iterations

你好，这个log没有报错信息，麻烦提供下你的paddle版本，以及运行命令

GM5GM5 commented 9 months ago

我解决了这个问题，是paddle版本的问题，我是直接在官网复制的命令用pip安装的，后来我选择下载包自己编译，解决了这个问题，能够成功训练了。原版本是2.6.1，命令是python tools/train.py -c configs/det/det_mv3_db.yml

BaoyuLi12138 commented 8 months ago

[2023-08-31 15:34:06,387] [ INFO] - We are using (<class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'>, False) to load 'uie-senta-base'. [2023-08-31 15:34:06,387] [ INFO] - Already cached /root/.paddlenlp/models/uie-senta-base/ernie_3.0_base_zh_vocab.txt [2023-08-31 15:34:06,410] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/uie-senta-base/tokenizer_config.json [2023-08-31 15:34:06,411] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/uie-senta-base/special_tokens_map.json [2023-08-31 15:34:06,412] [ INFO] - Already cached /root/.paddlenlp/models/uie-senta-base/model_state.pdparams [2023-08-31 15:34:06,412] [ INFO] - Loading weights file model_state.pdparams from cache at /root/.paddlenlp/models/uie-senta-base/model_state.pdparams [2023-08-31 15:34:06,883] [ INFO] - Loaded weights file from disk, setting weights to model. W0831 15:34:06.887965 1829614 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0831 15:34:06.887992 1829614 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.7 W0831 15:34:06.891219 1829614 gpu_resources.cc:149] device: 0, cuDNN Version: 7.6.

进程已结束，退出代码 134 我是遇到这个问题就直接停掉了是否是需要更新cuDNN的版本呢?

BaoyuLi12138 commented 8 months ago

paddle-bfloat 0.1.7 paddle2onnx 1.0.9 paddlefsl 1.1.0 paddlenlp 2.6.0 paddlepaddle 2.5.1 paddlepaddle-gpu 2.5.1.post117 python ==3.9.12 这是我对应paddle的版本~

YanhuiDua commented 8 months ago

[2023-08-31 15:34:06,387] [ INFO] - We are using (<class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'>, False) to load 'uie-senta-base'. [2023-08-31 15:34:06,387] [ INFO] - Already cached /root/.paddlenlp/models/uie-senta-base/ernie_3.0_base_zh_vocab.txt [2023-08-31 15:34:06,410] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/uie-senta-base/tokenizer_config.json [2023-08-31 15:34:06,411] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/uie-senta-base/special_tokens_map.json [2023-08-31 15:34:06,412] [ INFO] - Already cached /root/.paddlenlp/models/uie-senta-base/model_state.pdparams [2023-08-31 15:34:06,412] [ INFO] - Loading weights file model_state.pdparams from cache at /root/.paddlenlp/models/uie-senta-base/model_state.pdparams [2023-08-31 15:34:06,883] [ INFO] - Loaded weights file from disk, setting weights to model. W0831 15:34:06.887965 1829614 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0831 15:34:06.887992 1829614 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.7 W0831 15:34:06.891219 1829614 gpu_resources.cc:149] device: 0, cuDNN Version: 7.6.

进程已结束，退出代码 134 我是遇到这个问题就直接停掉了是否是需要更新cuDNN的版本呢?

尝试下运行 python -c "import paddle;paddle.utils.run_check()"，看下输出是否正常

BaoyuLi12138 commented 8 months ago

我现在用了官网docker的环境:nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.5.1-gpu-cuda11.7-cudnn8.4-trt8.4

W0831 10:10:01.869431 1333 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0831 10:10:01.869459 1333 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.7 W0831 10:10:02.049255 1333 gpu_resources.cc:149] device: 0, cuDNN Version: 8.4. terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device 目前出现的是这个问题

对应的环境是: paddle-bfloat 0.1.7 paddle2onnx 1.0.9 paddlefsl 1.1.0 paddlenlp 2.6.0 paddlepaddle-gpu 2.5.1.post117

对应的显卡是1080ti 可能是模型不支持嘛?

YanhuiDua commented 8 months ago

你好，这个问题已经收到，我们看下

YanhuiDua commented 8 months ago

可以先尝试下使用低版本CUDA的镜像和whl包测试下

BaoyuLi12138 commented 8 months ago

[2023-09-01 03:33:44,390] [ INFO] - Loading weights file model_state.pdparams from cache at /root/.paddlenlp/models/uie-senta-base/model_state.pdparams [2023-09-01 03:33:45,324] [ INFO] - Loaded weights file from disk, setting weights to model. W0901 03:33:45.331705 1222 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0901 03:33:45.331734 1222 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.2 W0901 03:33:45.335844 1222 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device

Aborted (core dumped) 目前是这个问题我网上查了一下好像是核心已转储说是调小batch_size 我现在把batch_size调整为4 但是目前还是会报错是因为我用的机器的配置有问题吗? 我目前的命令行为: python finetune.py \ --train_path ./data/train.json \ --dev_path ./data/dev.json \ --save_dir ./checkpoint \ --learning_rate 1e-5 \ --batch_size 4 \ --max_seq_len 512 \ --num_epochs 3 \ --model uie-senta-base \ --seed 1000 \ --logging_steps 10 \ --valid_steps 100 \ --device gpu 目前的包版本为: addle-bfloat 0.1.7
paddle2onnx 1.0.9
paddlefsl 1.1.0
paddlenlp 2.6.0
paddlepaddle-gpu 2.5.1.post112

BaoyuLi12138 commented 8 months ago

命令行改换为: python -u -m paddle.distributed.launch --gpus "0" finetune.py --train_path ./data/train.json --dev_path ./data/dev.json --save_dir ./checkpoint --learning_rate 1e-5 --batch_size 4 --max_seq_len 512 --num_epochs 3 --model uie-senta-base --seed 1000 --logging_steps 10 --valid_steps 100 --device gpu

现在会出现: [2023-09-04 05:42:17,598] [ INFO] - We are using (<class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'>, False) to load 'uie-senta-base'. [2023-09-04 05:42:17,599] [ INFO] - Already cached /root/.paddlenlp/models/uie-senta-base/ernie_3.0_base_zh_vocab.txt [2023-09-04 05:42:17,630] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/uie-senta-base/tokenizer_config.json [2023-09-04 05:42:17,630] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/uie-senta-base/special_tokens_map.json [2023-09-04 05:42:17,631] [ INFO] - Already cached /root/.paddlenlp/models/uie-senta-base/model_state.pdparams [2023-09-04 05:42:17,632] [ INFO] - Loading weights file model_state.pdparams from cache at /root/.paddlenlp/models/uie-senta-base/model_state.pdparams [2023-09-04 05:42:18,228] [ INFO] - Loaded weights file from disk, setting weights to model. W0904 05:42:18.233268 1611 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0904 05:42:18.233296 1611 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.2 W0904 05:42:18.236320 1611 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device LAUNCH INFO 2023-09-04 05:42:20,033 Pod failed LAUNCH ERROR 2023-09-04 05:42:20,034 Container failed !!! Container rank 0 status failed cmd ['/usr/bin/python', '-u', 'finetune.py'] code -6 log log/workerlog.0 env {'GREP_COLOR': '1;31', 'LC_ALL': 'en_US.UTF-8', 'SSH_CONNECTION': '192.168.4.93 65019 172.17.0.2 22', 'LANG': 'en_US.UTF-8', 'USER': 'root', 'PWD': '/paddle/PaddleNLP/applications/sentiment_analysis/unified_sentiment_extraction', 'HOME': '/root', 'CLICOLOR': '1', 'SSH_CLIENT': '192.168.4.93 65019 22', 'GREP_OPTIONS': '--color=auto', 'SSH_TTY': '/dev/pts/1', 'MAIL': '/var/mail/root', 'TERM': 'xterm', 'SHELL': '/bin/bash', 'SHLVL': '1', 'LANGUAGE': 'enUS.UTF-8', 'LOGNAME': 'root', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin', 'PS1': '[\033[1;33m]λ [\033[1;37m]\h [\033[1;32m]\w [\033[0m]', '': '/usr/bin/python', 'OLDPWD': '/paddle/PaddleNLP/applications/sentiment_analysis', 'CUSTOM_DEVICE_ROOT': '', 'OMP_NUM_THREADS': '1', 'POD_NAME': 'sygxoa', 'PADDLE_MASTER': '172.17.0.2:40366', 'PADDLE_GLOBAL_SIZE': '1', 'PADDLE_LOCAL_SIZE': '1', 'PADDLE_GLOBAL_RANK': '0', 'PADDLE_LOCAL_RANK': '0', 'PADDLE_NNODES': '1', 'PADDLE_TRAINER_ENDPOINTS': '172.17.0.2:40367', 'PADDLE_CURRENT_ENDPOINT': '172.17.0.2:40367', 'PADDLE_TRAINER_ID': '0', 'PADDLE_TRAINERS_NUM': '1', 'PADDLE_RANK_IN_NODE': '0', 'FLAGS_selected_gpus': '0'} LAUNCH INFO 2023-09-04 05:42:20,034 ------------------------- ERROR LOG DETAIL ------------------------- grep: warning: GREP_OPTIONS is deprecated; please use an alias or script [2023-09-04 05:42:17,598] [ INFO] - We are using (<class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'>, False) to load 'uie-senta-base'. [2023-09-04 05:42:17,599] [ INFO] - Already cached /root/.paddlenlp/models/uie-senta-base/ernie_3.0_base_zh_vocab.txt [2023-09-04 05:42:17,630] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/uie-senta-base/tokenizer_config.json [2023-09-04 05:42:17,630] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/uie-senta-base/special_tokens_map.json [2023-09-04 05:42:17,631] [ INFO] - Already cached /root/.paddlenlp/models/uie-senta-base/model_state.pdparams [2023-09-04 05:42:17,632] [ INFO] - Loading weights file model_state.pdparams from cache at /root/.paddlenlp/models/uie-senta-base/model_state.pdparams [2023-09-04 05:42:18,228] [ INFO] - Loaded weights file from disk, setting weights to model. W0904 05:42:18.233268 1611 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0904 05:42:18.233296 1611 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.2 W0904 05:42:18.236320 1611 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device LAUNCH INFO 2023-09-04 05:42:20,035 Exit code -6

YanhuiDua commented 8 months ago

你好，如果只运行paddle.utils.run_check()就会出现这个“cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device”报错的话，那就与运行命令/模型配置都无关，是paddle安装的问题；建议可以降低下cuda版本或者源码编译下

BaoyuLi12138 commented 8 months ago

您好我按照您之前给我说的单独运行了 python -c "import paddle;paddle.utils.run_check()"

目前出现的问题是: grep: warning: GREP_OPTIONS is deprecated; please use an alias or script Running verify PaddlePaddle program ... I0904 08:32:01.975775 1720 interpretercore.cc:237] New Executor is Running. W0904 08:32:01.976156 1720 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0904 08:32:01.976168 1720 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.2 W0904 08:32:01.979440 1720 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. I0904 08:32:04.316855 1720 interpreter_util.cc:518] Standalone Executor is Used. PaddlePaddle works well on 1 GPU. grep: warning: GREP_OPTIONS is deprecated; please use an alias or script grep: warning: GREP_OPTIONS is deprecated; please use an alias or script grep: warning: GREP_OPTIONS is deprecated; please use an alias or script grep: warning: GREP_OPTIONS is deprecated; please use an alias or script ======================= Modified FLAGS detected ======================= FLAGS(name='FLAGS_selected_gpus', current_value='3', default_value='') I0904 08:32:05.960580 1767 tcp_utils.cc:107] Retry to connect to 127.0.0.1:34881 while the server is not yet listening. ======================= Modified FLAGS detected ======================= FLAGS(name='FLAGS_selected_gpus', current_value='1', default_value='') I0904 08:32:05.967067 1763 tcp_utils.cc:107] Retry to connect to 127.0.0.1:34881 while the server is not yet listening. ======================= Modified FLAGS detected ======================= FLAGS(name='FLAGS_selected_gpus', current_value='2', default_value='') I0904 08:32:05.975247 1765 tcp_utils.cc:107] Retry to connect to 127.0.0.1:34881 while the server is not yet listening. ======================= Modified FLAGS detected ======================= FLAGS(name='FLAGS_selected_gpus', current_value='0', default_value='') I0904 08:32:06.002357 1761 tcp_utils.cc:181] The server starts to listen on IP_ANY:34881 I0904 08:32:06.002689 1761 tcp_utils.cc:130] Successfully connected to 127.0.0.1:34881 I0904 08:32:08.960848 1767 tcp_utils.cc:130] Successfully connected to 127.0.0.1:34881 I0904 08:32:08.967320 1763 tcp_utils.cc:130] Successfully connected to 127.0.0.1:34881 I0904 08:32:08.975486 1765 tcp_utils.cc:130] Successfully connected to 127.0.0.1:34881 W0904 08:32:09.957010 1761 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0904 08:32:09.957083 1761 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.2 W0904 08:32:09.960577 1761 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. W0904 08:32:09.998554 1763 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0904 08:32:09.998610 1763 gpu_resources.cc:119] Please NOTE: device: 1, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.2 W0904 08:32:09.998611 1767 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0904 08:32:09.998682 1767 gpu_resources.cc:119] Please NOTE: device: 3, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.2 W0904 08:32:09.998685 1765 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0904 08:32:09.998782 1765 gpu_resources.cc:119] Please NOTE: device: 2, GPU Compute Capability: 6.1, Driver API Version: 11.7, Runtime API Version: 11.2 W0904 08:32:10.005148 1763 gpu_resources.cc:149] device: 1, cuDNN Version: 8.2. W0904 08:32:10.005192 1765 gpu_resources.cc:149] device: 2, cuDNN Version: 8.2. W0904 08:32:10.005231 1767 gpu_resources.cc:149] device: 3, cuDNN Version: 8.2. Failed, NCCL error ../paddle/fluid/distributed/collective/process_group_nccl.cc:660 'unhandled system error' Failed, NCCL error ../paddle/fluid/distributed/collective/process_group_nccl.cc:660 'unhandled system error'

C++ Traceback (most recent call last): 0 paddle::distributed::ProcessGroupNCCL::Barrier(paddle::distributed::BarrierOptions const&) 1 paddle::distributed::ProcessGroupNCCL::AllReduce(phi::DenseTensor, phi::DenseTensor const&, paddle::distributed::AllreduceOptions const&, bool, bool) 2 paddle::distributed::ProcessGroupNCCL::RunFnInNCCLEnv(std::function<void (ncclComm, CUstream_st*)>, phi::DenseTensor const&, paddle::distributed::CommType, bool, bool) 3 paddle::distributed::ProcessGroupNCCL::CreateNCCLEnvCache(phi::Place const&, std::string const&) 4 ncclCommInitRank

Error Message Summary: FatalError: Termination signal is detected by the operating system. [TimeInfo: Aborted at 1693816330 (unix time) try "date -d @1693816330" if you are using GNU date ] [SignalInfo: SIGTERM (@0x6b8) received by PID 1761 (TID 0x7f639e388740) from PID 1720 ]

C++ Traceback (most recent call last): 0 paddle::distributed::ProcessGroupNCCL::Barrier(paddle::distributed::BarrierOptions const&) 1 paddle::distributed::ProcessGroupNCCL::AllReduce(phi::DenseTensor, phi::DenseTensor const&, paddle::distributed::AllreduceOptions const&, bool, bool) 2 paddle::distributed::ProcessGroupNCCL::RunFnInNCCLEnv(std::function<void (ncclComm, CUstream_st*)>, phi::DenseTensor const&, paddle::distributed::CommType, bool, bool) 3 paddle::distributed::ProcessGroupNCCL::CreateNCCLEnvCache(phi::Place const&, std::string const&) 4 ncclCommInitRank

Error Message Summary: FatalError: Termination signal is detected by the operating system. [TimeInfo: Aborted at 1693816330 (unix time) try "date -d @1693816330" if you are using GNU date ] [SignalInfo: SIGTERM (@0x6b8) received by PID 1763 (TID 0x7fa368bda740) from PID 1720 ]

WARNING:root:PaddlePaddle meets some problem with 4 GPUs. This may be caused by:

There is not enough GPUs visible on your system Some GPUs are occupied by other process now NVIDIA-NCCL2 is not installed correctly on your system. Please follow instruction on https://github.com/NVIDIA/nccl-tests to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html WARNING:root: Original Error is: Process 2 terminated with exit code 1. PaddlePaddle is installed successfully ONLY for single GPU! Let's start deep learning with PaddlePaddle now. Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.7/dist-packages/paddle/utils/install_check.py", line 282, in run_check raise e File "/usr/local/lib/python3.7/dist-packages/paddle/utils/install_check.py", line 255, in run_check _run_parallel(device_list) File "/usr/local/lib/python3.7/dist-packages/paddle/utils/install_check.py", line 206, in _run_parallel paddle.distributed.spawn(train_for_run_parallel, nprocs=len(device_list)) File "/usr/local/lib/python3.7/dist-packages/paddle/distributed/spawn.py", line 595, in spawn while not context.join(): File "/usr/local/lib/python3.7/dist-packages/paddle/distributed/spawn.py", line 399, in join self._throw_exception(error_index) File "/usr/local/lib/python3.7/dist-packages/paddle/distributed/spawn.py", line 413, in _throw_exception % (error_index, exitcode) Exception: Process 2 terminated with exit code 1. 这是运行后出现的整体信息没有出现cudaErrorNoKernelImageForDevice的问题能麻烦您再帮我看一下这个的问题可能是什么嘛?

BaoyuLi12138 commented 8 months ago

您好我这边发现问题了因为同目录下两个代码所需要的环境不一样导致感谢大神的指导~

BaoyuLi12138 commented 8 months ago

记录一下最后的 cuda 10.2 cudnn 7.6 paddlepaddle-gpu 2.5.1-post102 paddlenlp 2.6.0

tianji2018 commented 8 months ago

虽然报的是warning，但是完全不能用，计算都是错的。

import paddle paddle.ones([3,3]) W0913 16:00:46.068766 51120 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0913 16:00:46.068822 51120 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.0, Driver API Version: 11.2, Runtime API Version: 11.2 W0913 16:00:46.073894 51120 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. Tensor(shape=[3, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])

YanhuiDua commented 8 months ago

虽然报的是warning，但是完全不能用，计算都是错的。

import paddle paddle.ones([3,3]) W0913 16:00:46.068766 51120 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0913 16:00:46.068822 51120 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.0, Driver API Version: 11.2, Runtime API Version: 11.2 W0913 16:00:46.073894 51120 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. Tensor(shape=[3, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])

运行下，python -c "import paddle;paddle.utils.run_check()"看下是否正确安装

tianji2018 commented 8 months ago

python -c "import paddle;paddle.utils.run_check()"

Running verify PaddlePaddle program ... I0913 16:34:03.777916 66339 interpretercore.cc:237] New Executor is Running. W0913 16:34:03.778102 66339 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0913 16:34:03.778110 66339 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.0, Driver API Version: 11.2, Runtime API Version: 11.2 W0913 16:34:03.779343 66339 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. I0913 16:34:03.961378 66339 interpreter_util.cc:518] Standalone Executor is Used. PaddlePaddle works well on 1 GPU. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

YanhuiDua commented 8 months ago

python -c "import paddle;paddle.utils.run_check()"

Running verify PaddlePaddle program ... I0913 16:34:03.777916 66339 interpretercore.cc:237] New Executor is Running. W0913 16:34:03.778102 66339 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0913 16:34:03.778110 66339 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.0, Driver API Version: 11.2, Runtime API Version: 11.2 W0913 16:34:03.779343 66339 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. I0913 16:34:03.961378 66339 interpreter_util.cc:518] Standalone Executor is Used. PaddlePaddle works well on 1 GPU. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

安装是正确的，麻烦提供下硬件设备和paddle版本

tianji2018 commented 8 months ago

python -c "import paddle;paddle.utils.run_check()"

Running verify PaddlePaddle program ... I0913 16:34:03.777916 66339 interpretercore.cc:237] New Executor is Running. W0913 16:34:03.778102 66339 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. W0913 16:34:03.778110 66339 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.0, Driver API Version: 11.2, Runtime API Version: 11.2 W0913 16:34:03.779343 66339 gpu_resources.cc:149] device: 0, cuDNN Version: 8.2. I0913 16:34:03.961378 66339 interpreter_util.cc:518] Standalone Executor is Used. PaddlePaddle works well on 1 GPU. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

安装是正确的，麻烦提供下硬件设备和paddle版本

系统：Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-162-generic x86_64) 显卡：NVIDIA P100 16G 显卡驱动版本：11.2 版本号：NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 cudnn 8.2 conda环境：python 3.10.9 paddle版本：paddlepaddle-gpu==2.5.1.post112，通过命令python -m pip install paddlepaddle-gpu==2.5.1.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html安装（conda命令也试了，一样的问题）

nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Mon_Nov_30_19:08:53_PST_2020 Cuda compilation tools, release 11.2, V11.2.67 Build cuda_11.2.r11.2/compiler.29373293_0

YanhuiDua commented 8 months ago

看上去版本都是对应的，测试下别的API呢？

tianji2018 commented 8 months ago

看上去版本都是对应的，测试下别的API呢？

x = paddle.to_tensor([-0.4, -0.2, 0.1, 0.3]) # Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,[-0.40000001, -0.20000000, 0.10000000, 0.30000001]) paddle.abs(x) # [0., 0., 0., 0.] paddle.argmax(x) # 0 paddle.argmin(x) # 0 x+1 # [0., 0., 0., 0.] x = paddle.to_tensor([-4,-2,1,3]) # Tensor(shape=[4], dtype=int64, place=Place(gpu:0), stop_gradient=True,[-4, -2, 1, 3]) x+1 #[-4734183924231779123, 4510805389529107661, 0 ,0 ] paddle.isnan(x) # Aborted (core dumped)崩溃退出

YanhuiDua commented 8 months ago

好的收到，我们看下在 2023年9月13日，17:46，tianji2018 @.***> 写道：

看上去版本都是对应的，测试下别的API呢？

x = paddle.to_tensor([-0.4, -0.2, 0.1, 0.3]) # Tensor(shape=[4], dtype=float32, place=Place(gpu:0), stop_gradient=True,[-0.40000001, -0.20000000, 0.10000000, 0.30000001]) paddle.abs(x) # [0., 0., 0., 0.] paddle.argmax(x) # 0 paddle.argmin(x) # 0 x+1 # [0., 0., 0., 0.] x = paddle.to_tensor([-4,-2,1,3]) # Tensor(shape=[4], dtype=int64, place=Place(gpu:0), stop_gradient=True,[-4, -2, 1, 3]) x+1 #[-4734183924231779123, 4510805389529107661, 0 ,0 ] paddle.isnan(x) # Aborted (core dumped)崩溃退出

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you modified the open/close state.Message ID: @.***>

tianji2018 commented 8 months ago

好的收到，我们看下

辛苦了

YanhuiDua commented 8 months ago

好的收到，我们看下

辛苦了

你好，P100(sm60)的架构使用cuda11.2的包可能会遇到问题，建议尝试下使用cuda10.2的whl包或者源码编译下

neuxys commented 8 months ago

好的收到，我们看下

辛苦了

你好，P100(sm60)的架构使用cuda11.2的包可能会遇到问题，建议尝试下使用cuda10.2的whl包或者源码编译下

您好，我也遇到了同样的问题，我发现在选择cuda的时候版本10.2没有支持ubuntu20.04，怎么办呢

YanhuiDua commented 8 months ago

好的收到，我们看下

辛苦了

你好，P100(sm60)的架构使用cuda11.2的包可能会遇到问题，建议尝试下使用cuda10.2的whl包或者源码编译下

您好，我也遇到了同样的问题，我发现在选择cuda的时候版本10.2没有支持ubuntu20.04，怎么办呢

你好，这个需要自己编译下~ 建议使用ubuntu18.04，20.04可能会遇到问题。编译可以参考https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile-by-make.html；

neuxys commented 8 months ago

好的收到，我们看下

辛苦了

你好，P100(sm60)的架构使用cuda11.2的包可能会遇到问题，建议尝试下使用cuda10.2的whl包或者源码编译下

您好，我也遇到了同样的问题，我发现在选择cuda的时候版本10.2没有支持ubuntu20.04，怎么办呢

你好，这个需要自己编译下~ 建议使用ubuntu18.04，20.04可能会遇到问题。编译可以参考https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile-by-make.html；

感谢您的回复！好的，我来尝试

tianji2018 commented 8 months ago

好的收到，我们看下

辛苦了

你好，P100(sm60)的架构使用cuda11.2的包可能会遇到问题，建议尝试下使用cuda10.2的whl包或者源码编译下

你好，我将paddle降级为2.4.0和2.4.2后，测试都可以正常工作了。

schild commented 7 months ago

版本乱的一批，搞一周了还没好

YanhuiDua commented 7 months ago

版本乱的一批，搞一周了还没好

请问具体是遇到什么问题了呢？可以新提一个issue提问

PaddlePaddle / Paddle

安装的时候报错The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website. #55715

问题描述 Issue Description

版本&环境信息 Version & Environment Information