Closed naoyam closed 1 year ago
Odd this wasn't caught before, not sure why this is happening and why this fixes it, but I'm comfortable with the change.
Bandwidth and speedup curves:
Speedup histogram:
The overall results look like mostly within random noises. There are some benchmarks showing 10% degradation, but they are most likely because they are pretty short-running kernels.
Fixes #2560
Previously, the rfactor ID sets gathered by the CA map did not include reduction rfactors. However, in the repro of #2560, traversing through an reduction rfactor ID is necessary to index patially inlined broadcast tensors. I don't see any reason not to include reduction rfactor IDs in the CA map, nor any potential side effect. Reduction rfactor has been relatively trivial compared to view rfactor IDs as IDs are reduced away, but it still needs to be processed with the ID traversal logic for indexing.