PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.05k stars 5.54k forks source link

单卡训练报错ValueError: paddle.distributed initialize error, environment variable FLAGS_selected_gpus is needed, but not set. #66092

Open gongchenting opened 1 month ago

gongchenting commented 1 month ago

bug描述 Describe the Bug

启动命令 export CUDA_VISIBLE_DEVICES=1 python3.7 tools/train.py -c ./ppcls/configs/car/car_poolformer_tricks_v15_re3.yaml 环境信息 Python 3.7.0 CUDA Version: 11.4 GPU: V100 paddle: 2.4.1 报错信息 Traceback (most recent call last): File "tools/train.py", line 31, in engine = Engine(config, mode="train") File "/root/paddlejob/workspace/env_run/xxx/ppcls/engine/engine.py", line 231, in init dist.init_parallel_env() File "/usr/local/lib/python3.7/site-packages/paddle/distributed/parallel.py", line 197, in init_parallel_env _check_var_exists("FLAGS_selected_gpus") File "/usr/local/lib/python3.7/site-packages/paddle/distributed/parallel.py", line 100, in _check_var_exists "environment variable %s is needed, but not set." % var_name ValueError: paddle.distributed initialize error, environment variable FLAGS_selected_gpus is needed, but not set.

其他补充信息 Additional Supplementary Information

No response

Sunting78 commented 1 month ago

您好 请设置环境变量 指定哪些 GPU 被选中用于训练 例如 export FLAGS_selected_gpus=1