NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
260 stars 51 forks source link

OpInfo has problems testing define_tensor. #3225

Open wujingyue opened 3 days ago

wujingyue commented 3 days ago

Context: https://github.com/NVIDIA/Fuser/pull/3222/files#diff-577ed6d3703dbc615028823a5113fdef10881ffb1247b9a79c7f17270650124fR11-R14

To repro, patch https://github.com/NVIDIA/Fuser/commit/b0ccb481a7d60f944abf4b6b164f625fec31d147 and run

pytest tests/python/test_ops.py -k define_tensor

cc @jjsjann123 and @rdspring1

wujingyue commented 3 days ago

Since #3222 is merged, you can now reproduce this by doing the following:

$ git checkout wjy/define

$ pytest tests/python/test_ops.py -k test_correctness_define_tensor_float32 -s
========================================================================================================================================================================================================================================= test session starts =========================================================================================================================================================================================================================================
platform linux -- Python 3.10.12, pytest-8.1.1, pluggy-1.5.0
Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket=<bucket_type>
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /opt/pytorch/nvfuser
plugins: xdist-3.6.1, timestamper-0.0.10, hypothesis-6.112.2, cov-5.0.0, timeout-2.3.1, random-order-1.1.1, mpi-0.6, benchmark-4.0.0, shard-0.1.2, typeguard-4.3.0
collected 896 items / 895 deselected / 1 selected
Running 1 items in this shard

tests/python/test_ops.py F

============================================================================================================================================================================================================================================== FAILURES ===============================================================================================================================================================================================================================================
_______________________________________________________________________________________________________________________________________________________________________________________________________________________________ test_correctness_define_tensor_float32 ________________________________________________________________________________________________________________________________________________________________________________________________________________________________

    def test():
        # Ref: https://github.com/pytorch/pytorch/blob/aa8ea1d787a9d21b064b664c5344376265feea6c/torch/testing/_internal/common_utils.py#L2251-L2263
        # > CUDA device side error will cause subsequence test cases to fail.
        # > stop entire test suite if catches RuntimeError during torch.cuda.synchronize().
        if torch.cuda.is_initialized():
            try:
                torch.cuda.synchronize()
            except RuntimeError as rte:
                pytest.exit(
                    "TEST SUITE EARLY TERMINATION due to torch.cuda.synchronize() failure"
                )

>       return template(opinfo, dtype)

tests/python/opinfo_framework.py:30:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/python/test_ops.py:215: in test_correctness
    return serde_test_fn(op, dtype)
tests/python/test_ops.py:206: in serde_test_fn
    result = correctness_test_fn(op.reference_type, op, sample)
tests/python/test_ops.py:190: in correctness_test_fn
    return torch_correctness_test_fn(_fd_fn, nvf_op, sample)
tests/python/test_ops.py:86: in torch_correctness_test_fn
    nvfuser_result = fd.execute(inputs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self =
def nvfuser_fusion_id0(fd : FusionDefinition) -> None :
    T0 = fd.define_tensor(shape=[1, -1], contiguity=[None, Tr...ue, None], dtype=DataType.Float, is_cpu=False, stride_order=[0, 1])
    T2 = fd.ops.add(T0, T1)
    fd.add_output(T2)

, inputs = [tensor([[-6.7103,  5.7013]], device='cuda:0')]

    def execute(
        self,
        inputs,
        *,
        device=None,
        override_user_schedule=False,
        capture_debug_output=False,
        print_repro=False,
        profile=False,
        save_repro_inputs=False,
    ):
        """
        Executes an nvFuser set of kernels for a given Fusion

        The FusionDefinition will be executed on a single CUDA device.
        Typically, which device to run on is determined by the devices where
        the input tensors reside. However, if the Fusion is defined such that
        none of the inputs are tensors, we are not able to infer a device from
        the inputs. For example, the following FusionDefinition will be unable
        to unambiguously infer the device of its output:

            with FusionDefinition() as fd:
                tv1 = fd.ops.full([5])
                fd.add_output(tv1)

        In that case, we default to selecting the first CUDA
        device, i.e. `torch.device("cuda:0")`. This method enables selecting an
        alternative preferred device.

        Args:
            inputs (List[Union[Tensor, Scalar]]): A list of inputs to fusion.

        Kwargs:
            device (Optional[Union[int, str, torch.device]]): This is a hint to run
                the Fusion on the given CUDA device. This is not typically
                necessary, as the device is usually inferred from the locations
                of input tensors. However, for some fusion definitions, no
                tensors will be input (for example when all tensors are
                generated with `full` or `uniform` ops). In these cases, we
                must either tell NVFuser where to run the resulting kernel, or
                let it default to 0. Note that passing this option providing
                and input tensors that lie on another device is an error.
            override_user_schedule (bool): For a user defined schedule,
                override with auto-generated schedule (default: False)
            capture_debug_output (bool): Whether to capture any printed
                debugging information as a string. If True, the string can be
                retrieved after execution using :meth:`get_debug_output`. If False,
                then that method will return None when called.
            print_repro (bool): Prints a reproduction script to stdout.
            profile (bool): Captures a CUPTI based profile of a fusion.
            save_repro_inputs (bool): Saves the inputs for last_repro_script() to
                provide a provide a reproduction script.

        Returns:
            List[Tensor]
        """
        self.profiled = profile

        if device is not None:
            if not isinstance(device, torch.device):
                device = torch.device(device)
            assert (
                device.type == "cuda"
            ), "If device argument is passed it must be a CUDA device"
            device = device.index

        # if definition is not defined by a context manager, try a child class
        if self.id() is None:
            self._setup_definition()
            self.definition()
            self._finalize_definition()

        defined_multidevice_schedule = hasattr(
            self, "multidevice_schedule"
        ) and isinstance(self.multidevice_schedule, Callable)
        defined_schedule = hasattr(self, "schedule") and isinstance(
            self.schedule, Callable
        )
        assert not (
            defined_multidevice_schedule and defined_schedule
        ), "I haven't tested what if both are defined. We don't plan to support this use case although it may just work."

        if defined_multidevice_schedule:
            # Unlike `schedule`, `multidevice_schedule` is designed for inter-device
            # scheduling, The scheduling is done before concretization and therefore
            # before pre-segmentation. `schedule` however assumes the FusionDefinition
            # has been concretized and pre-segmented, and therefore requires
            # `_setup_schedule` and `_finalize_schedule` to be called before and after.
            #
            # Note: there's a plan to embed multidevice schedules into FusionDefinition
            # as annotating nodes. This may eventually replace `multidevice_schedule`.
            self.multidevice_schedule()

        # If schedule is defined by child class and schedule is not defined for
        # inputs, make a schedule.
        if defined_schedule:
            # Schedule fusion if it does not exist yet or profiling fusion
            if profile or not self._exist_schedule(inputs):
                self._setup_schedule(inputs, overwrite_existing_schedule=profile)
                self.schedule()
                self._finalize_schedule(inputs)

        if save_repro_inputs:
            from torch._subclasses.fake_tensor import FakeTensorMode

            fake_mode = FakeTensorMode()
            self.fake_inputs = [fake_mode.from_tensor(inp) for inp in inputs]

        results = None
        try:
>           results = self._execute(
                inputs,
                device=device,
                override_user_schedule=override_user_schedule,
                capture_debug_output=capture_debug_output,
                profile=profile,
            )
E           RuntimeError:  INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/runtime/executor_utils.cpp":708, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. KernelArgumentHolder contains less argument than kernel's input.
E           Exception raised from bindInputs at /opt/pytorch/nvfuser/csrc/runtime/executor_utils.cpp:708 (most recent call first):
E           frame #0: nvfuser::nvfCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xf3 (0x7ff7946f48e7 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
E           frame #1: nvfuser::nvfErrorFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x53 (0x7ff794aac533 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
E           frame #2: nvfuser::executor_utils::bindInputs(nvfuser::KernelArgumentHolder const&, nvfuser::Fusion*) + 0xb3a (0x7ff794d8fb3a in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
E           frame #3: <unknown function> + 0x7f41cc (0x7ff794da91cc in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
E           frame #4: nvfuser::FusionExecutorCache::runFusionWithInputs(c10::ArrayRef<c10::IValue> const&, std::optional<nvfuser::PrimDataType>, std::optional<signed char>) + 0xa9 (0x7ff794daaa39 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
E           frame #5: nvfuser::python_frontend::FusionDefinition::execute(c10::ArrayRef<c10::IValue> const&, std::optional<signed char>, bool, bool, bool) const + 0x796 (0x7ff794f195a6 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
E           frame #6: <unknown function> + 0x1cc00e (0x7ff79478100e in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
E           frame #7: <unknown function> + 0x24a21f (0x7ff7947ff21f in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
E           frame #8: <unknown function> + 0x2df550 (0x7ff794894550 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so)
E           frame #9: <unknown function> + 0x15cb2e (0x57fe59be9b2e in /usr/bin/python3)
E           frame #10: _PyObject_MakeTpCall + 0x25b (0x57fe59be02db in /usr/bin/python3)
E           frame #11: <unknown function> + 0x16b55b (0x57fe59bf855b in /usr/bin/python3)
E           frame #12: _PyEval_EvalFrameDefault + 0x1983 (0x57fe59bd3b93 in /usr/bin/python3)
E           frame #13: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #14: _PyEval_EvalFrameDefault + 0x8ab (0x57fe59bd2abb in /usr/bin/python3)
E           frame #15: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #16: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3)
E           frame #17: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #18: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3)
E           frame #19: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #20: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3)
E           frame #21: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #22: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3)
E           frame #23: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #24: _PyEval_EvalFrameDefault + 0x285e (0x57fe59bd4a6e in /usr/bin/python3)
E           frame #25: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #26: _PyEval_EvalFrameDefault + 0x285e (0x57fe59bd4a6e in /usr/bin/python3)
E           frame #27: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #28: _PyEval_EvalFrameDefault + 0x613a (0x57fe59bd834a in /usr/bin/python3)
E           frame #29: <unknown function> + 0x16b281 (0x57fe59bf8281 in /usr/bin/python3)
E           frame #30: _PyEval_EvalFrameDefault + 0x613a (0x57fe59bd834a in /usr/bin/python3)
E           frame #31: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #32: _PyObject_FastCallDictTstate + 0x16d (0x57fe59bdf51d in /usr/bin/python3)
E           frame #33: _PyObject_Call_Prepend + 0x5c (0x57fe59bf52bc in /usr/bin/python3)
E           frame #34: <unknown function> + 0x2826d0 (0x57fe59d0f6d0 in /usr/bin/python3)
E           frame #35: _PyObject_MakeTpCall + 0x25b (0x57fe59be02db in /usr/bin/python3)
E           frame #36: _PyEval_EvalFrameDefault + 0x72ea (0x57fe59bd94fa in /usr/bin/python3)
E           frame #37: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #38: _PyEval_EvalFrameDefault + 0x8ab (0x57fe59bd2abb in /usr/bin/python3)
E           frame #39: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #40: _PyEval_EvalFrameDefault + 0x285e (0x57fe59bd4a6e in /usr/bin/python3)
E           frame #41: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #42: _PyEval_EvalFrameDefault + 0x613a (0x57fe59bd834a in /usr/bin/python3)
E           frame #43: <unknown function> + 0x16b281 (0x57fe59bf8281 in /usr/bin/python3)
E           frame #44: _PyEval_EvalFrameDefault + 0x613a (0x57fe59bd834a in /usr/bin/python3)
E           frame #45: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #46: _PyObject_FastCallDictTstate + 0x16d (0x57fe59bdf51d in /usr/bin/python3)
E           frame #47: _PyObject_Call_Prepend + 0x5c (0x57fe59bf52bc in /usr/bin/python3)
E           frame #48: <unknown function> + 0x2826d0 (0x57fe59d0f6d0 in /usr/bin/python3)
E           frame #49: PyObject_Call + 0xbb (0x57fe59bf8ebb in /usr/bin/python3)
E           frame #50: _PyEval_EvalFrameDefault + 0x285e (0x57fe59bd4a6e in /usr/bin/python3)
E           frame #51: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #52: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3)
E           frame #53: <unknown function> + 0x16b281 (0x57fe59bf8281 in /usr/bin/python3)
E           frame #54: _PyEval_EvalFrameDefault + 0x1983 (0x57fe59bd3b93 in /usr/bin/python3)
E           frame #55: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #56: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3)
E           frame #57: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #58: _PyEval_EvalFrameDefault + 0x1983 (0x57fe59bd3b93 in /usr/bin/python3)
E           frame #59: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #60: _PyEval_EvalFrameDefault + 0x285e (0x57fe59bd4a6e in /usr/bin/python3)
E           frame #61: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3)
E           frame #62: _PyEval_EvalFrameDefault + 0x613a (0x57fe59bd834a in /usr/bin/python3)
E           frame #63: <unknown function> + 0x16b281 (0x57fe59bf8281 in /usr/bin/python3)

nvfuser/__init__.py:181: RuntimeError
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Captured log call ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ERROR    nvfuser:__init__.py:192 An error occurred while executing nvFuser FusionDefinition 0.
If you believe this is a bug or need assistance, please file an issue at https://github.com/NVIDIA/Fuser/issues/new
Here's a script to reproduce the error:
```python
# CUDA devices:
#  0: NVIDIA RTX 6000 Ada Generation
#  1: NVIDIA RTX 6000 Ada Generation
# torch version: 2.6.0a0+git0eba7e5
# cuda version: 12.6
# nvfuser version: 0.2.15+gitf01caf7
import torch
from nvfuser import FusionDefinition, DataType

def nvfuser_fusion_id0(fd : FusionDefinition) -> None :
    T0 = fd.define_tensor(shape=[1, -1], contiguity=[None, True], dtype=DataType.Float, is_cpu=False, stride_order=[1, 0])
    T1 = fd.define_tensor(shape=[1, 2], contiguity=[True, None], dtype=DataType.Float, is_cpu=False, stride_order=[0, 1])
    T2 = fd.ops.add(T0, T1)
    fd.add_output(T2)

with FusionDefinition() as fd:
    nvfuser_fusion_id0(fd)

inputs = [
    torch.testing.make_tensor((1, 2), dtype=torch.float32, device='cuda:0'),
]
fd.execute(inputs)

Traceback (most recent call last): File "/opt/pytorch/nvfuser/nvfuser/init.py", line 181, in execute results = self._execute( RuntimeError: INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/runtime/executor_utils.cpp":708, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. KernelArgumentHolder contains less argument than kernel's input. Exception raised from bindInputs at /opt/pytorch/nvfuser/csrc/runtime/executor_utils.cpp:708 (most recent call first): frame #0: nvfuser::nvfCheckFail(char const, char const, unsigned int, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xf3 (0x7ff7946f48e7 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so) frame #1: nvfuser::nvfErrorFail(char const, char const, unsigned int, char const*, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x53 (0x7ff794aac533 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so) frame #2: nvfuser::executor_utils::bindInputs(nvfuser::KernelArgumentHolder const&, nvfuser::Fusion*) + 0xb3a (0x7ff794d8fb3a in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so) frame #3: + 0x7f41cc (0x7ff794da91cc in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so) frame #4: nvfuser::FusionExecutorCache::runFusionWithInputs(c10::ArrayRef const&, std::optional, std::optional) + 0xa9 (0x7ff794daaa39 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so) frame #5: nvfuser::python_frontend::FusionDefinition::execute(c10::ArrayRef const&, std::optional, bool, bool, bool) const + 0x796 (0x7ff794f195a6 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so) frame #6: + 0x1cc00e (0x7ff79478100e in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so) frame #7: + 0x24a21f (0x7ff7947ff21f in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so) frame #8: + 0x2df550 (0x7ff794894550 in /opt/pytorch/nvfuser/nvfuser/_C.cpython-310-x86_64-linux-gnu.so) frame #9: + 0x15cb2e (0x57fe59be9b2e in /usr/bin/python3) frame #10: _PyObject_MakeTpCall + 0x25b (0x57fe59be02db in /usr/bin/python3) frame #11: + 0x16b55b (0x57fe59bf855b in /usr/bin/python3) frame #12: _PyEval_EvalFrameDefault + 0x1983 (0x57fe59bd3b93 in /usr/bin/python3) frame #13: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #14: _PyEval_EvalFrameDefault + 0x8ab (0x57fe59bd2abb in /usr/bin/python3) frame #15: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #16: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3) frame #17: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #18: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3) frame #19: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #20: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3) frame #21: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #22: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3) frame #23: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #24: _PyEval_EvalFrameDefault + 0x285e (0x57fe59bd4a6e in /usr/bin/python3) frame #25: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #26: _PyEval_EvalFrameDefault + 0x285e (0x57fe59bd4a6e in /usr/bin/python3) frame #27: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #28: _PyEval_EvalFrameDefault + 0x613a (0x57fe59bd834a in /usr/bin/python3) frame #29: + 0x16b281 (0x57fe59bf8281 in /usr/bin/python3) frame #30: _PyEval_EvalFrameDefault + 0x613a (0x57fe59bd834a in /usr/bin/python3) frame #31: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #32: _PyObject_FastCallDictTstate + 0x16d (0x57fe59bdf51d in /usr/bin/python3) frame #33: _PyObject_Call_Prepend + 0x5c (0x57fe59bf52bc in /usr/bin/python3) frame #34: + 0x2826d0 (0x57fe59d0f6d0 in /usr/bin/python3) frame #35: _PyObject_MakeTpCall + 0x25b (0x57fe59be02db in /usr/bin/python3) frame #36: _PyEval_EvalFrameDefault + 0x72ea (0x57fe59bd94fa in /usr/bin/python3) frame #37: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #38: _PyEval_EvalFrameDefault + 0x8ab (0x57fe59bd2abb in /usr/bin/python3) frame #39: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #40: _PyEval_EvalFrameDefault + 0x285e (0x57fe59bd4a6e in /usr/bin/python3) frame #41: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #42: _PyEval_EvalFrameDefault + 0x613a (0x57fe59bd834a in /usr/bin/python3) frame #43: + 0x16b281 (0x57fe59bf8281 in /usr/bin/python3) frame #44: _PyEval_EvalFrameDefault + 0x613a (0x57fe59bd834a in /usr/bin/python3) frame #45: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #46: _PyObject_FastCallDictTstate + 0x16d (0x57fe59bdf51d in /usr/bin/python3) frame #47: _PyObject_Call_Prepend + 0x5c (0x57fe59bf52bc in /usr/bin/python3) frame #48: + 0x2826d0 (0x57fe59d0f6d0 in /usr/bin/python3) frame #49: PyObject_Call + 0xbb (0x57fe59bf8ebb in /usr/bin/python3) frame #50: _PyEval_EvalFrameDefault + 0x285e (0x57fe59bd4a6e in /usr/bin/python3) frame #51: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #52: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3) frame #53: + 0x16b281 (0x57fe59bf8281 in /usr/bin/python3) frame #54: _PyEval_EvalFrameDefault + 0x1983 (0x57fe59bd3b93 in /usr/bin/python3) frame #55: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #56: _PyEval_EvalFrameDefault + 0x6bc (0x57fe59bd28cc in /usr/bin/python3) frame #57: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #58: _PyEval_EvalFrameDefault + 0x1983 (0x57fe59bd3b93 in /usr/bin/python3) frame #59: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #60: _PyEval_EvalFrameDefault + 0x285e (0x57fe59bd4a6e in /usr/bin/python3) frame #61: _PyFunction_Vectorcall + 0x7c (0x57fe59bea42c in /usr/bin/python3) frame #62: _PyEval_EvalFrameDefault + 0x613a (0x57fe59bd834a in /usr/bin/python3) frame #63: + 0x16b281 (0x57fe59bf8281 in /usr/bin/python3) ======================================================================================================================================================================================================================================= short test summary info ======================================================================================================================================================================================================================================= FAILED tests/python/test_ops.py::test_correctness_define_tensor_float32 - RuntimeError: INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/runtime/executor_utils.cpp":708, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. KernelArgumentHolder contains less argument than kernel's input. ================================================================================================================================================================================================================================== 1 failed, 895 deselected in 2.00s ==================================================================================================================================================================================================================================