Open jjsjann123 opened 2 years ago
Will get a repro once I verified the issue on devel branch.
Currently the issue comes from microbenchmarks. https://github.com/pytorch/pytorch/issues/75282
autogen-15 error, TorchScript IR:
with prim::CudaFusionGroup_0 = graph(%10 : Float(768, strides=[1], requires_grad=0, device=cuda:0), %13 : Float(512, 768, strides=[768, 1], requires_grad=0, device=cuda:0), %11 : int): %7 : int[] = prim::Constant[value=[512, 768]]() %14 : int[] = prim::Constant[value=[1, 512, 768]]() %15 : Float(1, 512, 768, strides=[393216, 768, 1], requires_grad=0, device=cuda:0) = prim::reshape_copy(%13, %14) %12 : Float(1, 512, 768, strides=[393216, 768, 1], requires_grad=0, device=cuda:0) = aten::add(%15, %10, %11) %5 : Float(1, 512, 768, strides=[393216, 768, 1], requires_grad=0, device=cuda:0) = prim::reshape_copy(%12, %14) %2 : Float(512, 768, strides=[768, 1], requires_grad=0, device=cuda:0) = prim::reshape_copy(%12, %7) return (%2, %5)
Fusion ir and error:
Inputs: T0_g[ iS0{i1} ], float T1_g[ iS1{i2}, iS2{i3} ], float i4, int64_t Outputs: T6_g[ rS15{1}, iS16{i2}, iS17{i3} ], float T5_g[ bS12{1}, iS13{i2}, iS14{i3} ], float %kernel_math { T2_l[ bS3{1}, iS4{i2}, iS5{i3} ] = broadcast( T1_g[ iS1{i2}, iS2{i3} ] ) T3_l[ bS6{1}, bS7{1}, iS8{i1} ] = broadcast( T0_g[ iS0{i1} ] ) d6 = (double)(i4); T4_l[ bS9{1}, bS10{1}, iS11{i1} ] = T3_l[ bS6{1}, bS7{1}, iS8{i1} ] * d6; T5_g[ bS12{1}, iS13{i2}, iS14{i3} ] = T2_l[ bS3{1}, iS4{i2}, iS5{i3} ] + T4_l[ bS9{1}, bS10{1}, iS11{i1} ]; T6_g[ rS15{1}, iS16{i2}, iS17{i3} ] = reduction( T5_g[ bS12{1}, iS13{i2}, iS14{i3} ], op = add, initial value = double(0), fused = 0 ) } Traceback (most recent call last): File "run_microbenchmarks.py", line 24, in <module> run() File "run_microbenchmarks.py", line 19, in run microbenchmark.run(bm_args) File "/raid/playground/nick/benchmark/torchbenchmark/microbenchmarks/nvfuser/__init__.py", line 150, in run run_nvfuser_microbenchmarks(extra_args=args) File "/raid/playground/nick/benchmark/torchbenchmark/microbenchmarks/nvfuser/__init__.py", line 146, in run_nvfuser_microbenchmarks outputs.append((fuser, b.run_test(inputs, fuser))) File "/raid/playground/nick/benchmark/torchbenchmark/microbenchmarks/nvfuser/__init__.py", line 125, in run_test return run_test(self.ir, inputs, warmup_runs=self.warmup_runs, test_runs=self.test_runs) File "/raid/playground/nick/benchmark/torchbenchmark/microbenchmarks/nvfuser/__init__.py", line 87, in run_test graph(*inputs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: rhs_i >= 0 && lhs_i >= 0 INTERNAL ASSERT FAILED at "/raid/pytorch/torch/csrc/jit/codegen/cuda/scheduler/pointwise.cpp":668, please report a bug to PyTorch.
Reproed this on master branch + microbenchmark in David's repo. I'll extract a repro on our devel branch and raise this issue to the team.
So this is a reshape issue, which would be swiped under our view issues. I'm disabling reshape on upstream for now to avoid codegen errors. https://github.com/pytorch/pytorch/pull/75539
view
🐛 Describe the bug
Will get a repro once I verified the issue on devel branch.
Currently the issue comes from microbenchmarks. https://github.com/pytorch/pytorch/issues/75282
autogen-15 error, TorchScript IR:
Fusion ir and error:
Versions
Reproed this on master branch + microbenchmark in David's repo. I'll extract a repro on our devel branch and raise this issue to the team.