Closed whuhxb closed 2 years ago
Hi,
You may need to check the GPU device. The code runs in multi-GPU setting by default. If you want to use single GPU or CPU only, the corresponding config should be changed, e.g. setting --gpu 1 to use single GPU.
Best.
@LiyaoTang Hi, which file is used to set the GPU device?
@LiyaoTang
When using multiple gpus, the error still occurs like this.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation gpu_0/model/resnet_backbone/res1_input_conv/L2Loss: node gpu_0/model/resnet_backbone/res1_input_conv/L2Loss (defined at /export/home/contrastBoundary/tensorflow/models/basic_operators.py:128) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1 ]. Make sure the device specification refers to a valid device. [[gpu_0/model/resnet_backbone/res1_input_conv/L2Loss]]
Errors may have originated from an input operation. Input Source operations connected to node gpu_0/model/resnet_backbone/res1_input_conv/L2Loss: model/resnet_backbone/res1_input_conv/weights/read (defined at /export/home/contrastBoundary/tensorflow/models/basic_operators.py:96)
Could you post the logged config and the output of nvidia-smi
command? There may also be a line in the log indicating the available GPU devises as well.
Or, could you post the full log here?
Best.
@LiyaoTang The log file is too long, and I have e-mailed it to you. Looking forward for your answers. Thanks a lot.
@LiyaoTang
Hi, I can't use GPU now, and I only use CPU. The above error occurs.
Hi,
That's the problem. You may use the flag --gpu 0
to use CPU only.
@LiyaoTang
Where to set the flag --gpu 0?
This is a command-line option. Please check the main.py
. You may also just modify the config file config/s3dis.py
.
Hi, have you ever met this bug?
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation gpu_0/model/resnet_backbone/res1_input_conv/L2Loss: node gpu_0/model/resnet_backbone/res1_input_conv/L2Loss (defined at /export/home/contrastBoundary/tensorflow/models/basic_operators.py:128) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device. [[gpu_0/model/resnet_backbone/res1_input_conv/L2Loss]]
Errors may have originated from an input operation. Input Source operations connected to node gpu_0/model/resnet_backbone/res1_input_conv/L2Loss: model/resnet_backbone/res1_input_conv/weights/read (defined at /export/contrastBoundary/tensorflow/models/basic_operators.py:96) srun: error: node16: task 0: Exited with exit code 1