ROCm / triton

Development repository for the Triton language and compiler
MIT License
83 stars 27 forks source link

[PYTORCH UT] Assertion `!NodePtr->isKnownSentinel()' failed. #443

Closed jataylo closed 3 months ago

jataylo commented 8 months ago

Failure in test_torchinductor::test_fuse_large_params_cuda.py

Passing at Nov 3 commit c65f1e62119b467b184e8c61b63274c5091da610, failing at current TOT.

Reproducer: https://gist.github.com/jataylo/a6a4ab855704d7d07e423a607cdeee7c (apologies for the large kernel)

Hardware: MI250X

Full error:

python: /home/runner/work/triton/triton/llvm-project/llvm/include/llvm/ADT/ilist_iterator.h:138: llvm::ilist_iterator::reference llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void, false>, false, false>::operator*() const [OptionsT = llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void, false>, IsReverse = false, IsConst = false]: Assertion `!NodePtr->isKnownSentinel()' failed.

cc: @jayfurmanek @zhanglx13

jataylo commented 6 months ago

Hey @zhanglx13 are you still planning to take a look at this?

There only appears one UT has the issue, but if it seems like a fundamental issue to you we'd have to get your opinion on whether this may make us unstable or whether we can skip the UT for now and address later.

zhanglx13 commented 6 months ago

@jataylo I tried the reproducer on MI300 and it works. I also tried it on MI250X, it failed because the arch_info cannot be printed. So I don't think this is a fundamental issue given it works on newer GPUs.

zhanglx13 commented 4 months ago

Opened a ticket to LLVM compiler team. It seems a bug in si-lower-sgpr-spills pass.

zhanglx13 commented 4 months ago

PR is up on the llvm side: https://github.com/llvm/llvm-project/pull/88828