Tensor device problems in code

onpix commented 3 years ago

In some situation, training code raises a device error:

cuda error: an illegal memory access was encountered

After debug I found that the main reason is Tensor used during the training is not on the same device. For example:

In Generator3DLUT_identity and Generator3DLUT_zero, self.LUT.device is cpu. In TrilinearInterpolationFunction, int_package and float_package are also on cpu. However, the input and the output of the network are cuda tensor, causing sometimes device error occurs when running the model.

To solve the problem, it's better to init all tensors as the same and dynamic type, instead of initializing the tensor on fixed devices.

onpix commented 3 years ago

It's a good way to init tensor by:

tensor = torch.FloatTensor(...).type_as(input)

or

tensor = torch.FloatTensor(...).to(input.device)

which guarantees all tensors are on the same device.

zyhrainbow commented 2 years ago

In some situation, training code raises a device error:
cuda error: an illegal memory access was encountered
After debug I found that the main reason is Tensor used during the training is not on the same device. For example:

In Generator3DLUT_identity and Generator3DLUT_zero, self.LUT.device is cpu. In TrilinearInterpolationFunction, int_package and float_package are also on cpu. However, the input and the output of the network are cuda tensor, causing sometimes device error occurs when running the model.

To solve the problem, it's better to init all tensors as the same and dynamic type, instead of initializing the tensor on fixed devices.

Could you explain more？ I met this problem，but I don't know how to do this. Do you mean that we can put the LUT and TrilinearInterpolationFunction on the gpu?

HuiZeng / Image-Adaptive-3DLUT

Tensor device problems in code #45