ROCm / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
219 stars 50 forks source link

Scale XBLOCK in triton reduction configs to avoid hitting max grid #1434

Closed jataylo closed 1 month ago

jataylo commented 1 month ago

https://ontrack-internal.amd.com/browse/SWDEV-463139 - resolves issue observed with long sequence size observed in gpt-fast.

Without this change we exceed the max grid as XBLOCK is too small.

pruthvistony commented 1 month ago

@jataylo Can it be upstreamed?