Closed wujingyue closed 1 week ago
gdb gives a more useful stack trace. Apparently, this error came from ExactMappedExtentSubstitutionPass. cc @liqiangxl who appears to be the author (https://github.com/NVIDIA/Fuser/pull/1642).
(gdb) bt
#0 0x00007ffff77c435a in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffedc5d249e in nvfuser::nvfCheckFail (func=0x7ffedd167f76 "maybeMutated", file=0x7ffedd167f50 "/opt/pytorch/nvfuser/csrc/mutator.cpp", line=45, msg=" INTERNAL ASSERT FAILED at \"/opt/pytorch/nvfuser/csrc/mutator.cpp\":45, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Two-hop mutations are not supported. "...) at /opt/pytorch/nvfuser/csrc/exceptions.cpp:275
#2 0x00007ffedc5d2728 in nvfuser::nvfErrorFail (func=0x7ffedd167f76 "maybeMutated", file=0x7ffedd167f50 "/opt/pytorch/nvfuser/csrc/mutator.cpp", line=45, condMsg=0x7ffedd167ea8 " INTERNAL ASSERT FAILED at \"/opt/pytorch/nvfuser/csrc/mutator.cpp\":45, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. ", userMsg="Two-hop mutations are not supported. Found registrations from 5 to 5 to 5") at /opt/pytorch/nvfuser/csrc/exceptions.cpp:301
#3 0x00007ffedca2c1d0 in nvfuser::OptOutMutator::maybeMutated (this=0x7fffffffbaf0, val=0x7fff38b8b180) at /opt/pytorch/nvfuser/csrc/mutator.cpp:45
#4 0x00007ffedca2c8d8 in nvfuser::OptOutMutator::mutate (this=0x7fffffffbaf0, id=0x7ffec6923580) at /opt/pytorch/nvfuser/csrc/mutator.cpp:101
#5 0x00007ffedc4d4e97 in nvfuser::Val::mutatorDispatch<nvfuser::OptOutMutator*> (mutator=0x7fffffffbaf0, val=0x7ffec6923580) at /opt/pytorch/nvfuser/csrc/dispatch.cpp:206
#6 0x00007ffedca2bfc1 in nvfuser::OptOutMutator::dispatchMutate (this=0x7fffffffbaf0, v=0x7ffec6923580) at /opt/pytorch/nvfuser/csrc/mutator.cpp:33
#7 0x00007ffedc935b84 in nvfuser::ir_utils::(anonymous namespace)::ValReplacementMutator::dispatchMutate (this=0x7fffffffbaf0, val=0x7ffec6923580) at /opt/pytorch/nvfuser/csrc/ir/utils.cpp:492
#8 0x00007ffedc4d4ae4 in nvfuser::Statement::mutatorDispatch<nvfuser::OptOutMutator*> (mutator=0x7fffffffbaf0, stmt=0x7ffec6923580) at /opt/pytorch/nvfuser/csrc/dispatch.cpp:230
#9 0x00007ffedca2bf97 in nvfuser::OptOutMutator::dispatchMutate (this=0x7fffffffbaf0, s=0x7ffec6923580) at /opt/pytorch/nvfuser/csrc/mutator.cpp:29
#10 0x00007ffedc9358f5 in nvfuser::ir_utils::(anonymous namespace)::ValReplacementMutator::ValReplacementMutator (this=0x7fffffffbaf0, fusion=0x7ffede572c00, replacement_map=std::unordered_map with 5 elements = {...}) at /opt/pytorch/nvfuser/csrc/ir/utils.cpp:476
#11 0x00007ffedc936105 in nvfuser::ir_utils::replaceValue (fusion=0x7ffede572c00, replacement_map=std::unordered_map with 5 elements = {...}) at /opt/pytorch/nvfuser/csrc/ir/utils.cpp:533
#12 0x00007ffedcaddddc in nvfuser::preseg_passes::(anonymous namespace)::exactMappedExtentSubstitution (fusion=0x7ffede572c00) at /opt/pytorch/nvfuser/csrc/preseg_passes/exact_mapped_extent_substitution.cpp:81
#13 0x00007ffedcade00c in nvfuser::preseg_passes::ExactMappedExtentSubstitutionPass::runPass (fusion=0x7ffede572c00) at /opt/pytorch/nvfuser/csrc/preseg_passes/exact_mapped_extent_substitution.cpp:95
#14 0x00007ffedcafc39e in nvfuser::preseg_passes::OptimizationPass<nvfuser::preseg_passes::ExactMappedExtentSubstitutionPass>::runPass (fusion=0x7ffede572c00) at /opt/pytorch/nvfuser/csrc/preseg_passes/optimization_pass.h:54
#15 0x00007ffedcafa8a4 in nvfuser::preseg_passes::PreSegmenter::runPass (fusion=0x7ffede572c00) at /opt/pytorch/nvfuser/csrc/preseg_passes/pre_segmenter.cpp:66
#16 0x00007ffedcbc09a5 in nvfuser::preseg_passes::OptimizationPass<nvfuser::preseg_passes::PreSegmenter>::runPass (fusion=0x7ffede572c00) at /opt/pytorch/nvfuser/csrc/preseg_passes/optimization_pass.h:54
#17 0x00007ffedcbb7950 in nvfuser::FusionKernelRuntime::FusionKernelRuntime (this=0x7fff38b8c080, fusion=std::unique_ptr<nvfuser::Fusion> = {...}, args=..., serde_buffer=0x0, forced_index_type=std::optional [no contained value], fusion_id=0, concrete_id=1, runtime_id=0, auto_schedule=true) at /opt/pytorch/nvfuser/csrc/runtime/fusion_kernel_runtime.cpp:75
#18 0x00007ffedcbaa0ad in std::make_unique<nvfuser::FusionKernelRuntime, std::unique_ptr<nvfuser::Fusion, std::default_delete<nvfuser::Fusion> >, nvfuser::KernelArgumentHolder const&, decltype(nullptr), std::optional<nvfuser::PrimDataType>&, long&, long&, unsigned long, bool const&>(std::unique_ptr<nvfuser::Fusion, std::default_delete<nvfuser::Fusion> >&&, nvfuser::KernelArgumentHolder const&, decltype(nullptr)&&, std::optional<nvfuser::PrimDataType>&, long&, long&, unsigned long&&, bool const&) () at /usr/include/c++/13/bits/unique_ptr.h:1070
#19 0x00007ffedcba4e2b in nvfuser::FusionExecutorCache::getKernelRuntimeFor (this=0x7ffeddbd3000, args=..., forced_index_type=std::optional [no contained value]) at /opt/pytorch/nvfuser/csrc/runtime/fusion_executor_cache.cpp:653
#20 0x00007ffedcba191f in nvfuser::FusionExecutorCache::runFusionWithInputs (this=0x7ffeddbd3000, inputs=..., forced_index_type=std::optional [no contained value], selected_device=std::optional [no contained value]) at /opt/pytorch/nvfuser/csrc/runtime/fusion_executor_cache.cpp:58
#21 0x00007ffedceb1e84 in nvfuser::python_frontend::FusionDefinition::execute (this=0x7ffedfdf0580, inputs=..., selected_device=std::optional [no contained value], override_user_schedule=false, capture_debug_output=false, profile=false) at /opt/pytorch/nvfuser/csrc/python_frontend/fusion_definition.cpp:414
#22 0x00007ffedbe9efc8 in operator() (__closure=0x7ffeddc3ce58, self=..., iter=..., device=std::optional [no contained value], override_user_schedule=false, capture_debug_output=false, profile=false) at /opt/pytorch/nvfuser/csrc/python_frontend/python_bindings.cpp:1044
#23 0x00007ffedbfd433e in pybind11::detail::argument_loader<nvfuser::python_frontend::FusionDefinition&, pybind11::iterable const&, std::optional<long>, bool, bool, bool>::call_impl<std::vector<at::Tensor>, nvfuser::python_frontend::initNvFuserPythonBindings(PyObject*)::<lambda(nvfuser::python_frontend::FusionDefinition&, const pybind11::iterable&, std::optional<long int>, bool, bool, bool)>&, 0, 1, 2, 3, 4, 5, pybind11::detail::void_type>(struct {...} &, std::index_sequence, pybind11::detail::void_type &&) (this=0x7fffffffce90, f=...) at /usr/local/lib/python3.12/dist-packages/torch/include/pybind11/cast.h:1631
#24 0x00007ffedbfc4a61 in pybind11::detail::argument_loader<nvfuser::python_frontend::FusionDefinition&, pybind11::iterable const&, std::optional<long>, bool, bool, bool>::call<std::vector<at::Tensor>, pybind11::detail::void_type, nvfuser::python_frontend::initNvFuserPythonBindings(PyObject*)::<lambda(nvfuser::python_frontend::FusionDefinition&, const pybind11::iterable&, std::optional<long int>, bool, bool, bool)>&>(struct {...} &) (this=0x7fffffffce90, f=...)
at /usr/local/lib/python3.12/dist-packages/torch/include/pybind11/cast.h:1600
#25 0x00007ffedbf68ac6 in operator() (__closure=0x0, call=...) at /usr/local/lib/python3.12/dist-packages/torch/include/pybind11/pybind11.h:278
#26 0x00007ffedbf68bac in _FUN () at /usr/local/lib/python3.12/dist-packages/torch/include/pybind11/pybind11.h:249
#27 0x00007ffedc006906 in pybind11::cpp_function::dispatcher (self=0x7fff31d7f750, args_in=0x7ffff6c4e740, kwargs_in=0x7fff23d1e500) at /usr/local/lib/python3.12/dist-packages/torch/include/pybind11/pybind11.h:971
#28 0x00000000005821ef in ?? ()
#29 0x0000000000548f8e in _PyObject_MakeTpCall ()
#30 0x00000000005d7819 in _PyEval_EvalFrameDefault ()
#31 0x00000000005d5d2b in PyEval_EvalCode ()
#32 0x0000000000608e12 in ?? ()
#33 0x00000000006b5253 in ?? ()
#34 0x00000000006b4fba in _PyRun_SimpleFileObject ()
#35 0x00000000006b4def in _PyRun_AnyFileObject ()
#36 0x00000000006bce95 in Py_RunMain ()
#37 0x00000000006bc97d in Py_BytesMain ()
#38 0x00007ffff79b31ca in ?? () from /usr/lib/x86_64-linux-gnu/libc.so.6
#39 0x00007ffff79b328b in __libc_start_main () from /usr/lib/x86_64-linux-gnu/libc.so.6
#40 0x00000000006584a5 in _start ()
cc @Priya2698 as well because this is related to linear.
@liqiangxl I think at this line we should do a loop over replacement_map
and chase references so that the map's entries all point to the leaves.https://github.com/NVIDIA/Fuser/blob/6abf3101ba1769123d76ab90f1b9661ff3772287/csrc/preseg_passes/exact_mapped_extent_substitution.cpp#L79-L81
I wonder if we should just do this automatically in replaceValue
so that we don't have to handle this in every use case.
The issue is becuase the extents in all these 3 disjoint sets are 5. We don't need to add constant values in the replacement map becuase if they are in the same disjoint set, they must be of the same extent.
disjoint sets{
{ iS5{5}; iS0{5}; iS2{5} }
{ rS7{5}; iS1{5}; iS3{5} }
{ iS6{5}; iS4{5} }
}
@naoyam also suggested
we may want to build a DisjointSets of extents based on the DisjointSets of IterDomains. The latter doesn't guarantee their extents are also disjoint.
I need to use customized hash & equal functions when using DisjointSets
to ensure const extents are hashed with its const value instead of pointer address. Results when using DisjointSets<Val*, ValPtrHash, ValPtrEqual> extent_sets
============id_sets==================
disjoint sets{
{ iS5{5}; iS0{5}; iS2{5} }
{ rS7{5}; iS1{5}; iS3{5} }
{ iS6{5}; iS4{5} }
}
==============================
============extent_set==================
disjoint sets{
{ 5 }
}
Otherwise, these extents are hashed differently. Results when using DisjointSets<Val*> extent_sets;
============extent_set==================
Extent sets: disjoint sets{
{ 5; 5; 5 }
{ 5; 5 }
}
This can still solve the original bug, since these vals have differnt address. So DisjointSets<Val*, ValPtrHash, ValPtrEqual> extent_sets
is not required.
I noticed this from the latest CI run of @IvanYashchuk's https://github.com/Lightning-AI/lightning-thunder/pull/1371. Apparently, it failed in some pre-segmenter pass.
Repro: