Scale XBLOCK in triton reduction configs to avoid hitting max grid

ROCm / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

http://pytorch.org

Other

219 stars 50 forks source link

Closed jataylo closed 1 month ago

jataylo commented 1 month ago

https://ontrack-internal.amd.com/browse/SWDEV-463139 - resolves issue observed with long sequence size observed in gpt-fast.

Without this change we exceed the max grid as XBLOCK is too small.

pruthvistony commented 1 month ago

@jataylo Can it be upstreamed?