Open jjsjann123 opened 1 month ago
Can you please run the fusion without the preseg pass but NVFUSER_ENABLE=id_model
? The lowering will construct an IdModel, so it it also fails, then if it's preseg or not should matter.
It's indeed failing. Running on top of #2252 (where preseg is no longer failing). You can see the added assert during GpuLower.
root@124a06e5d7bc:/volume# NVFUSER_DISABLE=parallel_compile NVFUSER_ENABLE=id_model python repro_nvfuser.py
Traceback (most recent call last):
File "/volume/repro_nvfuser.py", line 107, in <module>
fd.execute(inputs)
File "/opt/pytorch/nvfuser/nvfuser/__init__.py", line 200, in execute
result = self._execute(
RuntimeError: replay != nullptr INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/id_model/id_model.cpp":1266, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. no replay found
Exception raised from addReplayAs at /opt/pytorch/nvfuser/csrc/id_model/id_model.cpp:1266 (most recent call first):
frame #0: nvfuser::nvfCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x7f (0x7fce7ab6bb92 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #1: nvfuser::nvfErrorFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x5d (0x7fce7ab6bdd3 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x8883ca (0x7fce7ace33ca in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x8874fe (0x7fce7ace24fe in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x88529a (0x7fce7ace029a in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x885099 (0x7fce7ace0099 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #6: <unknown function> + 0x886195 (0x7fce7ace1195 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0x881494 (0x7fce7acdc494 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #8: <unknown function> + 0x5519c2 (0x7fce7a9ac9c2 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #9: nvfuser::GpuLower::GpuLower(nvfuser::Fusion*, nvfuser::CompileParams const&) + 0x5ee (0x7fce7a9abbcc in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #10: <unknown function> + 0x73e529 (0x7fce7ab99529 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #11: nvfuser::FusionExecutor::compileFusion(nvfuser::Fusion*, nvfuser::KernelArgumentHolder const&, nvfuser::LaunchParams const&, nvfuser::CompileParams, nvfuser::ScheduleHeuristic, long, long, long, long) + 0x638 (0x7fce7ab792d4 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #12: <unknown function> + 0xa1992d (0x7fce7ae7492d in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #13: nvfuser::FusionKernelRuntime::compileFusionParallel(nvfuser::KernelArgumentHolder) + 0x445 (0x7fce7ae740ab in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #14: nvfuser::FusionExecutorCache::runFusionWithInputs(c10::ArrayRef<c10::IValue> const&, std::optional<nvfuser::PrimDataType>, std::optional<signed char>) + 0x4e2 (0x7fce7ae6ec80 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #15: nvfuser::python_frontend::FusionDefinition::execute(c10::ArrayRef<c10::IValue> const&, bool, bool, std::optional<signed char>) const + 0x515 (0x7fce7b22a879 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #16: <unknown function> + 0x1a7700 (0x7fce7a602700 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #17: <unknown function> + 0x2ac89b (0x7fce7a70789b in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #18: <unknown function> + 0x29f95d (0x7fce7a6fa95d in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #19: <unknown function> + 0x24f72e (0x7fce7a6aa72e in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #20: <unknown function> + 0x24f800 (0x7fce7a6aa800 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
frame #21: <unknown function> + 0x2d7c15 (0x7fce7a732c15 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
<omitting python frames>
frame #37: <unknown function> + 0x29d90 (0x7fd0bb4b5d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #38: __libc_start_main + 0x80 (0x7fd0bb4b5e40 in /lib/x86_64-linux-gnu/libc.so.6)
Thanks for checking!
@jjsjann123, I'm just curious if the fusion in the issue description was generated from running Thunder's microbenchmark test_llama2_qkv_split_rope_7b_train
? I'm seeing the same failure but line number is different now:
replay != nullptr INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/id_model/id_model.cpp":786
@jjsjann123, I'm just curious if the fusion in the issue description was generated from running Thunder's microbenchmark
test_llama2_qkv_split_rope_7b_train
? I'm seeing the same failure but line number is different now:replay != nullptr INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/id_model/id_model.cpp":786
Oh no.... you shouldn't be running into this issue. (i.e. the issue still stands but it shouldn't pop up in codegen any more after #2252 ). But it's possible that even building EXACT graph triggers this issue.
Can you give me a repro command for running that benchmark?
I also see this problem running regular e2e benchmark now (current commit 8baa5505b247311a63adcca6e7fa2138929c8650):
python thunder/benchmarks/benchmark_litgpt.py --compile=thunder --micro_batch_size=1 --model_name=Llama-2-7b-hf --n_layers=1
The pattern from the benchmark looks similar to what we have in this repro. I think it's an accidental change here: https://github.com/NVIDIA/Fuser/pull/2298#discussion_r1616371356
Not sure if this is a real issue or just a mis-use of IdModel.
Here's the repro script vvv (likely it'll be rendered as obsolete after #2252)
Basically passing the fusion from this program below to
IdModel
construct triggers an assert duringbuildLoopGraph()
. I can help getting a cpp test if this turns out to be a real issue. cc'ing @naoyamBacktrace