When I execution the example within nvidia-docker, it cann't achieve the result

(base) root@e9f21ccb6520:/workspace/apex/examples/simple/distributed# bash run.sh Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic

—————————————————————— The cursor is waiting for the results always... But when I set “--nproc_per_node=1“ within run.sh , then run it, it can works fine. There are 6 GPUs in my computer. CUDA Version 9.0.176 pytorch 1.1.0 python 3.7.3

NVIDIA / apex

When I execution the example within nvidia-docker, it cann't achieve the result #408