我采用单卡训练是没问题的:python -m paddle.distributed.launch --gpus '0' train.py
但是多卡的时候:python -m paddle.distributed.launch --gpus '0,1' train.py
就会出现如下问题:
NotImplementedError: (Unimplemented) Place CUDAPlace(0) is not supported. Please check that your paddle compiles with WITH_GPU, WITH_XPU or WITH_ASCEND_CL option or check that your train process set the correct device id if you use Executor. (at /paddle/paddle/fluid/platform/device_context.cc:88)
[operator < gaussian_random > error]
Traceback (most recent call last):
File "train.py", line 204, in
model = MNIST()
File "train.py", line 94, in init
self.conv1 = Conv2D(in_channels=1, out_channels=20, kernel_size=5, stride=1, padding=2)
File "/root/anaconda3/lib/python3.8/site-packages/paddle/nn/layer/conv.py", line 633, in init
super(Conv2D, self).init(
File "/root/anaconda3/lib/python3.8/site-packages/paddle/nn/layer/conv.py", line 132, in init
self.weight = self.create_parameter(
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 411, in create_parameter
return self._helper.create_parameter(temp_attr, shape, dtype, is_bias,
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/layer_helper_base.py", line 369, in create_parameter
return self.main_program.global_block().create_parameter(
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py", line 2895, in create_parameter
initializer(param, self)
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/initializer.py", line 355, in call
op = block.append_op(
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py", line 2921, in append_op
_dygraph_tracer().trace_op(type,
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/dygraph/tracer.py", line 43, in trace_op
self.trace(type, inputs, outputs, attrs,
NotImplementedError: (Unimplemented) Place CUDAPlace(0) is not supported. Please check that your paddle compiles with WITH_GPU, WITH_XPU or WITH_ASCEND_CL option or check that your train process set the correct device id if you use Executor. (at /paddle/paddle/fluid/platform/device_context.cc:88)
[operator < gaussian_random > error]
我采用单卡训练是没问题的:python -m paddle.distributed.launch --gpus '0' train.py 但是多卡的时候:python -m paddle.distributed.launch --gpus '0,1' train.py 就会出现如下问题: NotImplementedError: (Unimplemented) Place CUDAPlace(0) is not supported. Please check that your paddle compiles with WITH_GPU, WITH_XPU or WITH_ASCEND_CL option or check that your train process set the correct device id if you use Executor. (at /paddle/paddle/fluid/platform/device_context.cc:88) [operator < gaussian_random > error] Traceback (most recent call last): File "train.py", line 204, in
model = MNIST()
File "train.py", line 94, in init
self.conv1 = Conv2D(in_channels=1, out_channels=20, kernel_size=5, stride=1, padding=2)
File "/root/anaconda3/lib/python3.8/site-packages/paddle/nn/layer/conv.py", line 633, in init
super(Conv2D, self).init(
File "/root/anaconda3/lib/python3.8/site-packages/paddle/nn/layer/conv.py", line 132, in init
self.weight = self.create_parameter(
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 411, in create_parameter
return self._helper.create_parameter(temp_attr, shape, dtype, is_bias,
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/layer_helper_base.py", line 369, in create_parameter
return self.main_program.global_block().create_parameter(
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py", line 2895, in create_parameter
initializer(param, self)
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/initializer.py", line 355, in call
op = block.append_op(
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/framework.py", line 2921, in append_op
_dygraph_tracer().trace_op(type,
File "/root/anaconda3/lib/python3.8/site-packages/paddle/fluid/dygraph/tracer.py", line 43, in trace_op
self.trace(type, inputs, outputs, attrs,
NotImplementedError: (Unimplemented) Place CUDAPlace(0) is not supported. Please check that your paddle compiles with WITH_GPU, WITH_XPU or WITH_ASCEND_CL option or check that your train process set the correct device id if you use Executor. (at /paddle/paddle/fluid/platform/device_context.cc:88)
[operator < gaussian_random > error]