csarofeen / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
26 stars 7 forks source link

Clean up index type handling #2570

Closed naoyam closed 1 year ago

naoyam commented 1 year ago

This is a cleanup PR of the code around the kernel index mode/type.

I'm sure we could do more, but I'll stop here for now.

Note that this does not address the issues @mmigdal-nv attempted to fix (#2522). Notably, these remain:

  1. We only look at kernel inputs to determine index type. Intermediate and output tensors are not considered, which can result in underestimate.
  2. Once a Fusion is lowered to a Kernel, its index type cannot be changed. I believe we could relax this constraint, but it's unclear how important it would be. We don't want to compile back and forth between int32 and int64, so we would need to keep two compiled kernel images of a single Kernel. This would certainly reduce the overhead of lowering a Fusion to a Kernel as it would need to be done just once for both int32 and int64, but the nvrtc compilation still needs to be done twice. And it only matters when the (rest of) scheduler heuristics are the same for problem sizes that range from small enough to use int32 and to large enough to use int64.

I think issue 1 is important and should be fixed, but the second one doesn't seem that urgent.

naoyam commented 1 year ago

All tests are green.

I was concerned if there would be any change with the benchmarks, but I confirmed all of the generated CUDA kernels (i.e., __tmp_kernel*.cu) are exactly the same as before, so I'm pretty confident nothing changes.