LiyaoTang / contrastBoundary

Contrastive Boundary Learning for Point Cloud Segmentation (CVPR2022)
MIT License
139 stars 11 forks source link

InvalidArgumentError: Cannot assign a device for operation #4

Closed whuhxb closed 2 years ago

whuhxb commented 2 years ago

Hi, have you ever met this bug?

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation gpu_0/model/resnet_backbone/res1_input_conv/L2Loss: node gpu_0/model/resnet_backbone/res1_input_conv/L2Loss (defined at /export/home/contrastBoundary/tensorflow/models/basic_operators.py:128) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device. [[gpu_0/model/resnet_backbone/res1_input_conv/L2Loss]]

Errors may have originated from an input operation. Input Source operations connected to node gpu_0/model/resnet_backbone/res1_input_conv/L2Loss: model/resnet_backbone/res1_input_conv/weights/read (defined at /export/contrastBoundary/tensorflow/models/basic_operators.py:96) srun: error: node16: task 0: Exited with exit code 1

LiyaoTang commented 2 years ago

Hi,

You may need to check the GPU device. The code runs in multi-GPU setting by default. If you want to use single GPU or CPU only, the corresponding config should be changed, e.g. setting --gpu 1 to use single GPU.

Best.

whuhxb commented 2 years ago

@LiyaoTang Hi, which file is used to set the GPU device?

whuhxb commented 2 years ago

@LiyaoTang

When using multiple gpus, the error still occurs like this.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation gpu_0/model/resnet_backbone/res1_input_conv/L2Loss: node gpu_0/model/resnet_backbone/res1_input_conv/L2Loss (defined at /export/home/contrastBoundary/tensorflow/models/basic_operators.py:128) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1 ]. Make sure the device specification refers to a valid device. [[gpu_0/model/resnet_backbone/res1_input_conv/L2Loss]]

Errors may have originated from an input operation. Input Source operations connected to node gpu_0/model/resnet_backbone/res1_input_conv/L2Loss: model/resnet_backbone/res1_input_conv/weights/read (defined at /export/home/contrastBoundary/tensorflow/models/basic_operators.py:96)

LiyaoTang commented 2 years ago

Could you post the logged config and the output of nvidia-smi command? There may also be a line in the log indicating the available GPU devises as well.

Or, could you post the full log here?

Best.

whuhxb commented 2 years ago

@LiyaoTang The log file is too long, and I have e-mailed it to you. Looking forward for your answers. Thanks a lot.

whuhxb commented 2 years ago

@LiyaoTang

Hi, I can't use GPU now, and I only use CPU. The above error occurs.

LiyaoTang commented 2 years ago

Hi,

That's the problem. You may use the flag --gpu 0 to use CPU only.

whuhxb commented 2 years ago

@LiyaoTang

Where to set the flag --gpu 0?

LiyaoTang commented 2 years ago

This is a command-line option. Please check the main.py. You may also just modify the config file config/s3dis.py.