csarofeen / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
26 stars 7 forks source link

Fix indexing failure with non-view rfactor #2562

Closed naoyam closed 1 year ago

naoyam commented 1 year ago

Fixes #2560

Previously, the rfactor ID sets gathered by the CA map did not include reduction rfactors. However, in the repro of #2560, traversing through an reduction rfactor ID is necessary to index patially inlined broadcast tensors. I don't see any reason not to include reduction rfactor IDs in the CA map, nor any potential side effect. Reduction rfactor has been relatively trivial compared to view rfactor IDs as IDs are reduced away, but it still needs to be processed with the ID traversal logic for indexing.

csarofeen commented 1 year ago

Odd this wasn't caught before, not sure why this is happening and why this fixes it, but I'm comfortable with the change.

naoyam commented 1 year ago

Bandwidth and speedup curves:

image

Speedup histogram:

image

The overall results look like mostly within random noises. There are some benchmarks showing 10% degradation, but they are most likely because they are pretty short-running kernels.