Closed balbasty closed 3 years ago
It was a bit more than just a switch. We should only use int32 offsets when it is possible (i.e., when all elements in the tensor -- which can be a view -- are reachable using int32 offsets). I have used something similar to this fix in pytorch's implementation of GridSampler. I had to refactor a bit to get all the allocation done before dispatching.
CUDA should use int32 offsets and CPP should use int64 offsets.