NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
257 stars 51 forks source link

codegen RuntimeError: indexed_id_map .emplace( ca_map.disjointSetOf(c_id, IdMappingMode::EXACT), ca_map.disjointSetOf(p_id, IdMappingMode::EXACT)) .second INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/scheduler/pointwise_utils.cpp":33 #309

Open jjsjann123 opened 1 year ago

jjsjann123 commented 1 year ago

Error:

Traceback (most recent call last):
  File "/opt/pytorch/pytorch/nvfuser/__init__.py", line 76, in execute
    result = self._execute(inputs, override_user_schedule)
RuntimeError: indexed_id_map .emplace( ca_map.disjointSetOf(c_id, IdMappingMode::EXACT), ca_map.disjointSetOf(p_id, IdMappingMode::EXACT)) .second INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/scheduler/pointwise_utils.cpp":33, please report a bug to PyTorch.
Traceback (most recent call last):
  File "/workspace/test.py", line 39, in <module>
    fd.execute(inputs)
  File "/opt/pytorch/pytorch/nvfuser/__init__.py", line 76, in execute
    result = self._execute(inputs, override_user_schedule)
RuntimeError: indexed_id_map .emplace( ca_map.disjointSetOf(c_id, IdMappingMode::EXACT), ca_map.disjointSetOf(p_id, IdMappingMode::EXACT)) .second INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/scheduler/pointwise_utils.cpp":33, please report a bug to PyTorch.

Repro scripts vvv

import torch
from nvfuser import FusionDefinition, DataType

def nvfuser_fusion_id57(fd : FusionDefinition) -> None :
    T0 = fd.define_tensor(symbolic_sizes=[-1], contiguity=[True], dtype=DataType.Float, is_cpu=False)
    T1 = fd.define_tensor(symbolic_sizes=[-1, -1, -1, -1], contiguity=[True, True, True, True], dtype=DataType.Float, is_cpu=False)
    T2 = fd.define_tensor(symbolic_sizes=[-1, -1, -1], contiguity=[True, True, True], dtype=DataType.Int, is_cpu=False)
    T3 = fd.ops.reshape(T0, original_shape=[4], new_shape=[4, 1, 1])
    T4 = fd.ops.max(T1, axes=[1], keepdim=False, dtype=DataType.Null)
    T5 = fd.ops.broadcast_in_dim(T4, output_shape=[3, 1, 2, 3], broadcast_dims=[0, 2, 3])
    T6 = fd.ops.broadcast_in_dim(T5, output_shape=[3, 4, 2, 3], broadcast_dims=[0, 1, 2, 3])
    T7 = fd.ops.sub(T1, T6)
    T8 = fd.ops.exp(T7)
    T9 = fd.ops.sum(T8, axes=[1], keepdim=False, dtype=DataType.Null)
    T10 = fd.ops.broadcast_in_dim(T9, output_shape=[3, 1, 2, 3], broadcast_dims=[0, 2, 3])
    T11 = fd.ops.log(T10)
    T12 = fd.ops.broadcast_in_dim(T11, output_shape=[3, 4, 2, 3], broadcast_dims=[0, 1, 2, 3])
    T13 = fd.ops.sub(T7, T12)
    T14 = fd.ops.neg(T13)
    T15 = fd.ops.broadcast_in_dim(T3, output_shape=[3, 4, 2, 3], broadcast_dims=[1, 2, 3])
    T16 = fd.ops.mul(T14, T15)
    T17 = fd.ops.reshape(T2, original_shape=[3, 2, 3], new_shape=[3, 1, 2, 3])
    T18 = fd.ops.gather(T16, T17, dim=1)
    T19 = fd.ops.sum(T18, axes=[0, 1, 2, 3], keepdim=False, dtype=DataType.Null)
    T20 = fd.ops.broadcast_in_dim(T3, output_shape=[3, 4, 2, 3], broadcast_dims=[1, 2, 3])
    T21 = fd.ops.gather(T20, T17, dim=1)
    T22 = fd.ops.sum(T21, axes=[0, 1, 2, 3], keepdim=False, dtype=DataType.Null)
    T23 = fd.ops.div(T19, T22)
    fd.add_output(T23)

with FusionDefinition() as fd:
    nvfuser_fusion_id57(fd)

inputs = [
    torch.randn((4,), dtype=torch.float32, device='cuda:0').as_strided((4,), (1,)),
    torch.randn((3, 4, 2, 3), dtype=torch.float32, device='cuda:0').as_strided((3, 4, 2, 3), (24, 6, 3, 1)),
    torch.randint(0, 10, (3, 2, 3), dtype=torch.int64, device='cuda:0').as_strided((3, 2, 3), (6, 3, 1)),
]
fd.execute(inputs)
jjsjann123 commented 1 year ago

This seems to be a new issue that I didn't run into previously. Coming from running cross_entropy example I didn't bisect it yet...

jjsjann123 commented 1 year ago

cc'ing @naoyam

naoyam commented 1 year ago

It's definitely a problem, but aren't we supposed to use take_along_axis instead of gather?

jjsjann123 commented 1 year ago

Good point. I will switch to that and see if the issue goes away.

Sorry my code base is a little bit awkward at this moment. There are some recent changes I need to clean up for this PR.