PaddlePaddle / PaddleClas

A treasure chest for visual classification and recognition powered by PaddlePaddle
Apache License 2.0
5.43k stars 1.17k forks source link

多显卡训练报错AttributeError: module 'paddle.fluid.libpaddle' has no attribute 'ProcessGroupNCCL' #2739

Closed happybear1015 closed 1 year ago

happybear1015 commented 1 year ago

(paddle_cls) D:\xx\PaddleClas>python -m paddle.distributed.launch --gpus="0,1" tools/train.py -c ./ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml
LAUNCH INFO 2023-04-06 13:26:02,235 ----------- Configuration ---------------------- LAUNCH INFO 2023-04-06 13:26:02,235 devices: 0,1 LAUNCH INFO 2023-04-06 13:26:02,235 elastic_level: -1 LAUNCH INFO 2023-04-06 13:26:02,235 elastic_timeout: 30 LAUNCH INFO 2023-04-06 13:26:02,235 gloo_port: 6767 LAUNCH INFO 2023-04-06 13:26:02,236 host: None LAUNCH INFO 2023-04-06 13:26:02,236 ips: None LAUNCH INFO 2023-04-06 13:26:02,236 job_id: default LAUNCH INFO 2023-04-06 13:26:02,236 legacy: False LAUNCH INFO 2023-04-06 13:26:02,236 log_dir: log LAUNCH INFO 2023-04-06 13:26:02,236 log_level: INFO LAUNCH INFO 2023-04-06 13:26:02,236 master: None LAUNCH INFO 2023-04-06 13:26:02,236 max_restart: 3 LAUNCH INFO 2023-04-06 13:26:02,236 nnodes: 1 LAUNCH INFO 2023-04-06 13:26:02,236 nproc_per_node: None LAUNCH INFO 2023-04-06 13:26:02,236 rank: -1 LAUNCH INFO 2023-04-06 13:26:02,236 run_mode: collective LAUNCH INFO 2023-04-06 13:26:02,237 server_num: None LAUNCH INFO 2023-04-06 13:26:02,237 servers: LAUNCH INFO 2023-04-06 13:26:02,237 start_port: 6070 LAUNCH INFO 2023-04-06 13:26:02,237 trainer_num: None LAUNCH INFO 2023-04-06 13:26:02,237 trainers: LAUNCH INFO 2023-04-06 13:26:02,237 training_script: tools/train.py LAUNCH INFO 2023-04-06 13:26:02,237 training_script_args: ['-c', './ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml'] LAUNCH INFO 2023-04-06 13:26:02,237 with_gloo: 1 LAUNCH INFO 2023-04-06 13:26:02,237 -------------------------------------------------- LAUNCH INFO 2023-04-06 13:26:02,238 Job: default, mode collective, replicas 1[1:1], elastic False LAUNCH INFO 2023-04-06 13:26:02,239 Run Pod: ylqihp, replicas 2, status ready LAUNCH INFO 2023-04-06 13:26:02,242 Watching Pod: ylqihp, replicas 2, status running LAUNCH WARNING 2023-04-06 13:26:02,377 save gpu info failed D:\xx\PaddleClas\ppcls\data\preprocess\ops\timm_autoaugment.py:39: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead. _RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC) D:\xx\PaddleClas\ppcls\data\preprocess\ops\timm_autoaugment.py:39: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead. _RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC) [2023/04/06 13:26:05] ppcls INFO:

== PaddleClas is powered by PaddlePaddle ! ==

== == == For more info please go to the following website. == == == == https://github.com/PaddlePaddle/PaddleClas ==

[2023/04/06 13:26:05] ppcls INFO: Arch : [2023/04/06 13:26:05] ppcls INFO: class_num : 2 [2023/04/06 13:26:05] ppcls INFO: name : ResNet101_vd [2023/04/06 13:26:05] ppcls INFO: DataLoader : [2023/04/06 13:26:05] ppcls INFO: Eval : [2023/04/06 13:26:05] ppcls INFO: dataset : [2023/04/06 13:26:05] ppcls INFO: cls_label_path : ./dataset/val_list.txt [2023/04/06 13:26:05] ppcls INFO: image_root : ./dataset/ [2023/04/06 13:26:05] ppcls INFO: name : ImageNetDataset [2023/04/06 13:26:05] ppcls INFO: transform_ops : [2023/04/06 13:26:05] ppcls INFO: DecodeImage : [2023/04/06 13:26:05] ppcls INFO: channel_first : False [2023/04/06 13:26:05] ppcls INFO: to_rgb : True [2023/04/06 13:26:05] ppcls INFO: ResizeImage : [2023/04/06 13:26:05] ppcls INFO: resize_short : 448 [2023/04/06 13:26:05] ppcls INFO: CropImage : [2023/04/06 13:26:05] ppcls INFO: size : 448 [2023/04/06 13:26:05] ppcls INFO: NormalizeImage : [2023/04/06 13:26:05] ppcls INFO: mean : [0.485, 0.456, 0.406] [2023/04/06 13:26:05] ppcls INFO: order : [2023/04/06 13:26:05] ppcls INFO: scale : 1.0/255.0 [2023/04/06 13:26:05] ppcls INFO: std : [0.229, 0.224, 0.225] [2023/04/06 13:26:05] ppcls INFO: loader : [2023/04/06 13:26:05] ppcls INFO: num_workers : 4 [2023/04/06 13:26:05] ppcls INFO: use_shared_memory : True [2023/04/06 13:26:05] ppcls INFO: sampler : [2023/04/06 13:26:05] ppcls INFO: batch_size : 16 [2023/04/06 13:26:05] ppcls INFO: drop_last : False [2023/04/06 13:26:05] ppcls INFO: name : DistributedBatchSampler [2023/04/06 13:26:05] ppcls INFO: shuffle : False [2023/04/06 13:26:05] ppcls INFO: Train : [2023/04/06 13:26:05] ppcls INFO: dataset : [2023/04/06 13:26:05] ppcls INFO: batch_transform_ops : None [2023/04/06 13:26:05] ppcls INFO: cls_label_path : ./dataset/train_list.txt [2023/04/06 13:26:05] ppcls INFO: image_root : ./dataset/ [2023/04/06 13:26:05] ppcls INFO: name : ImageNetDataset [2023/04/06 13:26:05] ppcls INFO: transform_ops : [2023/04/06 13:26:05] ppcls INFO: DecodeImage : [2023/04/06 13:26:05] ppcls INFO: channel_first : False [2023/04/06 13:26:05] ppcls INFO: to_rgb : True [2023/04/06 13:26:05] ppcls INFO: RandCropImage : [2023/04/06 13:26:05] ppcls INFO: size : 448 [2023/04/06 13:26:05] ppcls INFO: RandFlipImage : [2023/04/06 13:26:05] ppcls INFO: flip_code : 1 [2023/04/06 13:26:05] ppcls INFO: NormalizeImage : [2023/04/06 13:26:05] ppcls INFO: mean : [0.485, 0.456, 0.406] [2023/04/06 13:26:05] ppcls INFO: order : [2023/04/06 13:26:05] ppcls INFO: scale : 1.0/255.0 [2023/04/06 13:26:05] ppcls INFO: std : [0.229, 0.224, 0.225] [2023/04/06 13:26:05] ppcls INFO: loader : [2023/04/06 13:26:05] ppcls INFO: num_workers : 10 [2023/04/06 13:26:05] ppcls INFO: use_shared_memory : True [2023/04/06 13:26:05] ppcls INFO: sampler : [2023/04/06 13:26:05] ppcls INFO: batch_size : 16 [2023/04/06 13:26:05] ppcls INFO: drop_last : False [2023/04/06 13:26:05] ppcls INFO: name : DistributedBatchSampler [2023/04/06 13:26:05] ppcls INFO: shuffle : True [2023/04/06 13:26:05] ppcls INFO: Global : [2023/04/06 13:26:05] ppcls INFO: checkpoints : D:\xx\PaddleClas\output\ResNet101_vd\epoch_130 [2023/04/06 13:26:05] ppcls INFO: device : gpu [2023/04/06 13:26:05] ppcls INFO: distributed : Ture [2023/04/06 13:26:05] ppcls INFO: epochs : 300 [2023/04/06 13:26:05] ppcls INFO: eval_during_train : True [2023/04/06 13:26:05] ppcls INFO: eval_interval : 1 [2023/04/06 13:26:05] ppcls INFO: image_shape : [3, 448, 448] [2023/04/06 13:26:05] ppcls INFO: output_dir : ./output/ [2023/04/06 13:26:05] ppcls INFO: pretrained_model : D:\xx\PaddleClas\output\ResNet101_vd\epoch_130 [2023/04/06 13:26:05] ppcls INFO: print_batch_step : 10 [2023/04/06 13:26:05] ppcls INFO: save_inference_dir : ./inference [2023/04/06 13:26:05] ppcls INFO: save_interval : 1 [2023/04/06 13:26:05] ppcls INFO: use_visualdl : True [2023/04/06 13:26:05] ppcls INFO: Infer : [2023/04/06 13:26:05] ppcls INFO: PostProcess : [2023/04/06 13:26:05] ppcls INFO: class_id_map_file : ./dataset/mianma_step2.txt [2023/04/06 13:26:05] ppcls INFO: name : Topk [2023/04/06 13:26:05] ppcls INFO: topk : 5 [2023/04/06 13:26:05] ppcls INFO: batch_size : 10 [2023/04/06 13:26:05] ppcls INFO: infer_imgs : docs/images/inference_deployment/whl_demo.jpg [2023/04/06 13:26:05] ppcls INFO: transforms : [2023/04/06 13:26:05] ppcls INFO: DecodeImage : [2023/04/06 13:26:05] ppcls INFO: channel_first : False [2023/04/06 13:26:05] ppcls INFO: to_rgb : True [2023/04/06 13:26:05] ppcls INFO: ResizeImage : [2023/04/06 13:26:05] ppcls INFO: resize_short : 448 [2023/04/06 13:26:05] ppcls INFO: CropImage : [2023/04/06 13:26:05] ppcls INFO: size : 448 [2023/04/06 13:26:05] ppcls INFO: NormalizeImage : [2023/04/06 13:26:05] ppcls INFO: mean : [0.485, 0.456, 0.406] [2023/04/06 13:26:05] ppcls INFO: order : [2023/04/06 13:26:05] ppcls INFO: scale : 1.0/255.0 [2023/04/06 13:26:05] ppcls INFO: std : [0.229, 0.224, 0.225] [2023/04/06 13:26:05] ppcls INFO: ToCHWImage : None [2023/04/06 13:26:05] ppcls INFO: Loss : [2023/04/06 13:26:05] ppcls INFO: Eval : [2023/04/06 13:26:05] ppcls INFO: CELoss : [2023/04/06 13:26:05] ppcls INFO: weight : 1.0 [2023/04/06 13:26:05] ppcls INFO: Train : [2023/04/06 13:26:05] ppcls INFO: CELoss : [2023/04/06 13:26:05] ppcls INFO: epsilon : 0.1 [2023/04/06 13:26:05] ppcls INFO: weight : 1.0 [2023/04/06 13:26:05] ppcls INFO: Metric : [2023/04/06 13:26:05] ppcls INFO: Eval : [2023/04/06 13:26:05] ppcls INFO: TopkAcc : [2023/04/06 13:26:05] ppcls INFO: topk : [1, 5] [2023/04/06 13:26:05] ppcls INFO: Train : None [2023/04/06 13:26:05] ppcls INFO: Optimizer : [2023/04/06 13:26:05] ppcls INFO: lr : [2023/04/06 13:26:05] ppcls INFO: learning_rate : 0.1 [2023/04/06 13:26:05] ppcls INFO: name : Cosine [2023/04/06 13:26:05] ppcls INFO: momentum : 0.9 [2023/04/06 13:26:05] ppcls INFO: name : Momentum [2023/04/06 13:26:05] ppcls INFO: regularizer : [2023/04/06 13:26:05] ppcls INFO: coeff : 0.0001 [2023/04/06 13:26:05] ppcls INFO: name : L2 [2023/04/06 13:26:05] ppcls INFO: profiler_options : None [2023/04/06 13:26:05] ppcls INFO: train with paddle 2.4.2 and device Place(gpu:0) W0406 13:26:07.537612 19884 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 11.6 W0406 13:26:07.540604 19884 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4. path D:\xx\PaddleClas\output\ResNet101_vd\epoch_130 [2023/04/06 13:26:08] ppcls WARNING: The training strategy provided by PaddleClas is based on 4 gpus. But the number of gpu is 2 in current training. Please modify the stategy (learning rate, batch size and so on) if use this config to train. I0406 13:26:08.834146 19884 tcp_utils.cc:181] The server starts to listen on IP_ANY:54537 I0406 13:26:08.834146 19884 tcp_utils.cc:130] Successfully connected to 10.0.0.13:54537 Traceback (most recent call last): File "tools/train.py", line 31, in engine = Engine(config, mode="train") File "D:\xx\PaddleClas\ppcls\engine\engine.py", line 304, in init dist.init_parallel_env() File "D:\anaconda\envs\paddle_cls\lib\site-packages\paddle\distributed\parallel.py", line 281, in init_parallel_env pg = _new_process_group_impl( File "D:\anaconda\envs\paddle_cls\lib\site-packages\paddle\distributed\collective.py", line 207, in _new_process_group_impl pg = core.ProcessGroupNCCL(store, rank, world_size, place, group_id) AttributeError: module 'paddle.fluid.libpaddle' has no attribute 'ProcessGroupNCCL' LAUNCH INFO 2023-04-06 13:26:10,309 Pod failed LAUNCH ERROR 2023-04-06 13:26:10,310 Container failed !!! Container rank 1 status failed cmd ['D:\anaconda\envs\paddle_cls\python.exe', '-u', 'tools/train.py', '-c', './ppcls/configs/ImageNet/ResNet/ResNet101_vd.yaml'] code 1 log log\workerlog.1 env {'TERM_SESSION_ID': 'de808eb6-bc00-4886-b12b-9d25dd6fdb0c', 'COMMONPROGRAMW6432': 'C:\Program Files\Common Files', 'PROGRAMW6432': 'C:\Program Files', 'CONDA_DEFAULT_ENV': 'paddle_cls', 'CONDA_SHLVL': '1', 'USERNAME': 'Admini strator', 'ALLUSERSPROFILE': 'C:\ProgramData', 'USERPROFILE': 'C:\Users\Administrator', 'PROCESSOR_REVISION': '9e0d', 'IDEA_INITIAL_DIRECTORY': 'D:\xx', 'FPS_BROWSER_APP_PROFILE_STRING': 'Internet Explorer', 'PUBLIC': 'C:\Users \Public', 'CUDA_PATH_V11_6': 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6', 'PATH': 'D:\anaconda\envs\paddle_cls\lib\site-packages\paddle\fluid;D:\anaconda\envs\paddle_cls\lib\site-packages\paddle\flu id\..\libs;D:\anaconda\envs\paddle_cls;D:\anaconda\envs\paddle_cls\Library\mingw-w64\bin;D:\anaconda\envs\paddle_cls\Library\usr\bin;D:\anaconda\envs\paddle_cls\Library\bin;D:\anaconda\envs\paddle_cls\Scrip ts;D:\anaconda\envs\paddle_cls\bin;D:\anaconda\condabin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\libnvvp;D:\python\Scripts;D:\python;D:\ \Scripts;D:\;C:\Program Files\Eclipse Foundation\jdk-8.0.302.8-hotspot\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Windows\sys tem32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\Git\cmd;C:\Program Files\TortoiseSVN\bin;:\Program Files\ffmpeg-4.3.1-2020-10-01-fu ll_build\bin;C:\Program Files\ffmpeg-4.3.1-2020-10-01-full_build\bin;C:\Program Files\Microsoft SQL Server\130\Tools\Binn;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\NVI DIA NvDLISR;C:\Program Files\dotnet;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn;C:\Program Files\NVIDIA Corporation\Nsight Compute 2022.1.0;D:\zlib123dllx64\dll_x64;D:\anaconda\pip.ini;C:\Us ers\Administrator\AppData\Local\Microsoft\WindowsApps;C:\Program Files (x86)\GnuWin32\bin;G:\Anaconda3\Scripts;D:\xx\PyCharm Community Edition 2022.3.3\bin;.;C:\Users\Administrator\.dotnet\tools;D:\opencv\build\x 64\vc15\bin;.', 'DRIVERDATA': 'C:\Windows\System32\Drivers\DriverData', 'HOMEDRIVE': 'C:', 'SESSIONNAME': 'Console', 'LOGONSERVER': '\\DESKTOP-3RK0CE8', 'TERMINAL_EMULATOR': 'JetBrains-JediTerm', 'CONDA_PREFIX': 'D:\anacond a\envs\paddle_cls', 'HOMEPATH': '\Users\Administrator', 'SYSTEMROOT': 'C:\Windows', 'VS140COMNTOOLS': 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\Tools\', 'LOCALAPPDATA': 'C:\Users\Administrator\AppData \Local', 'WXDRIVE_START_ARGS': '--wxdrive-setting=0 --disable-gpu --disable-software-rasterizer --enable-features=NetworkServiceInProcess', 'APPDATA': 'C:\Users\Administrator\AppData\Roaming', 'PROCESSOR_IDENTIFIER': 'Intel64 F amily 6 Model 158 Stepping 13, GenuineIntel', 'PATHEXT': '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.PY;.PYW', 'PSMODULEPATH': 'C:\Program Files\WindowsPowerShell\Modules;C:\Windows\system32\WindowsPowerShell\v1.0 \Modules', 'CONDA_PROMPT_MODIFIER': '(paddle_cls) ', 'PROGRAMFILES(X86)': 'C:\Program Files (x86)', 'CUDA_PATH_V10_1': 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1', 'PROMPT': '(paddle_cls) $P$G', 'NVTOOLSEXT_PATH ': 'C:\Program Files\NVIDIA Corporation\NvToolsExt\', 'OS': 'Windows_NT', 'PROCESSOR_ARCHITECTURE': 'AMD64', 'NUMBER_OF_PROCESSORS': '16', 'COMSPEC': 'C:\Windows\system32\cmd.exe', 'PROCESSOR_LEVEL': '6', 'PYCHARM COMMUNITY E DITION': 'D:\xx\PyCharm Community Edition 2022.3.3\bin;', 'USERDOMAIN_ROAMINGPROFILE': 'DESKTOP-3RK0CE8', 'WINDIR': 'C:\Windows', 'GNU_HOME': 'C:\Program Files (x86)\GnuWin32', 'PROGRAMFILES': 'C:\Program Files', 'TEMP': 'C:\ \Users\ADMINI~1\AppData\Local\Temp', 'TMP': 'C:\Users\ADMINI~1\AppData\Local\Temp', 'COMMONPROGRAMFILES(X86)': 'C:\Program Files (x86)\Common Files', 'ONEDRIVE': 'C:\Users\Administrator\OneDrive', 'CUDA_PATH': 'C:\Pro gram Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6', 'USERDOMAIN': 'DESKTOP-3RK0CE8', 'SYSTEMDRIVE': 'C:', 'COMPUTERNAME': 'DESKTOP-3RK0CE8', 'PROGRAMDATA': 'C:\ProgramData', 'NVCUDASAMPLES10_1_ROOT': 'C:\ProgramData\NVIDIA Co rporation\CUDA Samples\v10.1', 'NVCUDASAMPLES_ROOT': 'C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.1', 'FPS_BROWSER_USER_PROFILE_STRING': 'Default', 'COMMONPROGRAMFILES': 'C:\Program Files\Common Files', 'INTELLIJ_CO MMAND_HISTFILE': 'C:\Users\Administrator\AppData\Local\JetBrains\PyCharmCE2022.3\terminal\history\PaddleClas-history', 'CUSTOM_DEVICE_ROOT': '', 'OMP_NUM_THREADS': '1', 'POD_NAME': 'ylqihp', 'PADDLE_MASTER': '10.0.0.13:54 537', 'PADDLE_GLOBAL_SIZE': '2', 'PADDLE_LOCAL_SIZE': '2', 'PADDLE_GLOBAL_RANK': '1', 'PADDLE_LOCAL_RANK': '1', 'PADDLE_NNODES': '1', 'PADDLE_TRAINER_ENDPOINTS': '10.0.0.13:54538,10.0.0.13:54539', 'PADDLE_CURRENT_ENDPOINT': '10.0.0. 13:54539', 'PADDLE_TRAINER_ID': '1', 'PADDLE_TRAINERS_NUM': '2', 'PADDLE_RANK_IN_NODE': '1', 'FLAGS_selected_gpus': '1'} LAUNCH INFO 2023-04-06 13:26:10,316 ------------------------- ERROR LOG DETAIL ------------------------- D:\xx\PaddleClas\ppcls\data\preprocess\ops\timm_autoaugment.py:39: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead. _RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC) D:\xx\PaddleClas\ppcls\data\preprocess\ops\timm_autoaugment.py:39: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead. _RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC) W0406 13:26:07.538610 1480 gpu_resources.cc:61] Please NOTE: device: 1, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 11.6 W0406 13:26:07.540604 1480 gpu_resources.cc:91] device: 1, cuDNN Version: 8.4. path D:\xx\PaddleClas\output\ResNet101_vd\epoch_130 I0406 13:26:09.057548 1480 tcp_utils.cc:130] Successfully connected to 10.0.0.13:54537 Traceback (most recent call last): File "tools/train.py", line 31, in engine = Engine(config, mode="train") File "D:\xx\PaddleClas\ppcls\engine\engine.py", line 304, in init dist.init_parallel_env() File "D:\anaconda\envs\paddle_cls\lib\site-packages\paddle\distributed\parallel.py", line 281, in init_parallel_env pg = _new_process_group_impl( File "D:\anaconda\envs\paddle_cls\lib\site-packages\paddle\distributed\collective.py", line 207, in _new_process_group_impl pg = core.ProcessGroupNCCL(store, rank, world_size, place, group_id) AttributeError: module 'paddle.fluid.libpaddle' has no attribute 'ProcessGroupNCCL' LAUNCH INFO 2023-04-06 13:26:10,528 Exit code 1

cuicheng01 commented 1 year ago

麻烦提供下显卡型号,CUDA版本,cudnn版本,docker镜像版本,NCCL版本呢

happybear1015 commented 1 year ago

RTX 2080ti   cuda 11.6 cudnn 11.2 没有用docker nccl 2.12

---原始邮件--- 发件人: @.> 发送时间: 2023年4月6日(周四) 下午4:10 收件人: @.>; 抄送: @.**@.>; 主题: Re: [PaddlePaddle/PaddleClas] 多显卡训练报错AttributeError: module 'paddle.fluid.libpaddle' has no attribute 'ProcessGroupNCCL' (Issue #2739)

麻烦提供下显卡型号,CUDA版本,cudnn版本,docker镜像版本,NCCL版本呢

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

happybear1015 commented 1 year ago

解决了,windows不行~换了linux,安装nccl就可以了