Open onpix opened 3 years ago
It's a good way to init tensor by:
tensor = torch.FloatTensor(...).type_as(input)
or
tensor = torch.FloatTensor(...).to(input.device)
which guarantees all tensors are on the same device.
In some situation, training code raises a device error:
cuda error: an illegal memory access was encountered
After debug I found that the main reason is
Tensor
used during the training is not on the same device. For example:In
Generator3DLUT_identity
andGenerator3DLUT_zero
,self.LUT.device
iscpu
. InTrilinearInterpolationFunction
,int_package
andfloat_package
are also oncpu
. However, the input and the output of the network are cuda tensor, causing sometimes device error occurs when running the model.To solve the problem, it's better to init all tensors as the same and dynamic type, instead of initializing the tensor on fixed devices.
Could you explain more? I met this problem,but I don't know how to do this. Do you mean that we can put the LUT and TrilinearInterpolationFunction on the gpu?
In some situation, training code raises a device error:
After debug I found that the main reason is
Tensor
used during the training is not on the same device. For example:In
Generator3DLUT_identity
andGenerator3DLUT_zero
,self.LUT.device
iscpu
. InTrilinearInterpolationFunction
,int_package
andfloat_package
are also oncpu
. However, the input and the output of the network are cuda tensor, causing sometimes device error occurs when running the model.To solve the problem, it's better to init all tensors as the same and dynamic type, instead of initializing the tensor on fixed devices.