NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.37k stars 1.39k forks source link

When I execution the example within nvidia-docker, it cann't achieve the result #408

Open Johnson13 opened 5 years ago

Johnson13 commented 5 years ago

(base) root@e9f21ccb6520:/workspace/apex/examples/simple/distributed# bash run.sh Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic

—————————————————————— The cursor is waiting for the results always... But when I set “--nproc_per_node=1“ within run.sh , then run it, it can works fine. There are 6 GPUs in my computer. CUDA Version 9.0.176 pytorch 1.1.0 python 3.7.3

ptrblck commented 5 years ago

Hi @Johnson13,

do you see this issue only in the apex examples or also using a plain PyTorch code? I just rerun our example and it's working fine on your systems (8x P100).