Closed jataylo closed 5 months ago
This was passing in November commit https://github.com/ROCmSoftwarePlatform/triton/commit/e8a35b3968780e48df1374482d56cc6cdbb9e351
uisng rocm/pytorch-nightly:latest, with fresh container and only mount fresh cloned triton, I have got below error on MI250X: it break all triton tutorial
it seems break all tutorials ?
(py_3.8) root@hyd-7c-ZT09-02:/home/test/triton/python/tutorials# python 03-matrix-multiplication.py
Traceback (most recent call last):
File "03-matrix-multiplication.py", line 156, in <module>
import triton.language as tl
File "/home/test/triton/python/triton/language/__init__.py", line 4, in <module>
from . import math
File "/home/test/triton/python/triton/language/math.py", line 5, in <module>
from . import core
File "/home/test/triton/python/triton/language/core.py", line 8, in <module>
from .._C.libtriton.triton import ir
ModuleNotFoundError: No module named 'triton._C.libtriton'
with rocm/pytorch:latest, it has no issue at all
This should fix the above tests and reproducer. https://github.com/xiaohuguo2023/tritontest/blob/main/reproducer_bs.py and https://github.com/xiaohuguo2023/pytorch/tree/pt-inductorUT-fix
Thanks @xiaohuguo2023 with this workaround we pass these tests.
Please keep us in the loop any findings from the investigation of why this only fails for us and used to pass for us so we can best decide upstream strategy. If we adopt this in PyTorch we would have to have conditional implementations for ROCm/NV if there is no fix at triton level.
the latest upstream openai upstream has changed triton.compile interface
in triton-mlir: def compile(fn, **kwargs):
In upstream openai: def compile(src, target=None, options=None):
https://github.com/xiaohuguo2023/pytorch/commit/dd960611bca349db97e701d14754b83a97c0b8f0
@xiaohuguo2023 to submit PR with the binary search change.
@xiaohuguo2023 note that this one is PASSING with upstream backend at commit https://github.com/openai/triton/commit/a9bc1a36470eefafe0e2ab2503b8698f1e89e7e3.
Problem Description
Seeing ~15 PyTorch UTs failures at TOT triton-mlir reporting this failure previously hidden by https://github.com/ROCm/triton/issues/412
GPU
AMD Instinct MI250X
ROCm Version
ROCm 5.7.0
Steps to Reproduce
Use
rocm/pytorch-nightly:latest
image and TOT triton-mlir https://github.com/ROCm/triton/commit/6aa01113db5aaedb99748cc439519c9ea562ab66Reproducer:
Traceback