csarofeen / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
26 stars 7 forks source link

Codegen error on transform_replay #1538

Open jjsjann123 opened 2 years ago

jjsjann123 commented 2 years ago

🐛 Describe the bug

Error I'm running into are:

root@f3d8903f445f:/raid/playground# PYTORCH_NVFUSER_DISABLE_FALLBACK=1 python animesh_repro.py

Traceback (most recent call last):
  File "animesh_repro.py", line 19, in <module>
    forward(*inps)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: it != replay_CasP.getReplay().end() INTERNAL ASSERT FAILED at "/raid/pytorch_10_1/torch/csrc/jit/codegen/cuda/transform_replay.cpp":491, please report a bug to PyTorch. Could not find axis, iS235{( ceilDiv(( i10 * i11 ), 1) )}, requested in replay.

Script for repro:

import torch                                                                                                            

def forward(i0, i1, i2, i3, i4, i5, i6, i7):                                                                                                                                                                       
  i_tb1 = torch.ops.aten.threshold_backward(i6, i7, 0)                                                                  
  i18 = torch.ops.aten.view(i_tb1, [1, 1024, 128, 128])                                                                 
  i19, i20, i21 = torch.ops.aten.native_batch_norm_backward(i18, i0, i1, i2, i3, i4, i5, False, 1e-5, [True, True, True])
  i22 = torch.ops.aten.view(i20, [16, 64])                                                                              
  i_s2 = torch.ops.aten.sum(i22, 0)                                                                                     
  i24 = torch.ops.aten.view(i21, [16, 64])                                                                              
  i_s1 = torch.ops.aten.sum(i24, 0)                                                                                     
  i26 = torch.ops.aten.view(i19, [16, 64, 128, 128])                                                                    
  return (i26, i_s1, i_s2)                                                                                              

inps = [(torch.Size([1, 1024, 128, 128]), torch.float32), (torch.Size([1024]), torch.float32), (torch.Size([1024]), torch.float32), (torch.Size([1024]), torch.float32), (torch.Size([0]), torch.float32), (torch.Size([0]), torch.float32), (torch.Size([16, 64, 128, 128]), torch.float32), (torch.Size([16, 64, 128, 128]), torch.float32)]
inps = [torch.randn(shape, dtype=dtype, device='cuda') for shape, dtype in inps]                                        
forward = torch.jit.script(forward)                                                                                     
with torch.jit.fuser("fuser2"):                                                                                         
  forward(*inps)                                                                                                        
  forward(*inps)                                                                                                        
  forward(*inps)

Versions

This repros on my devel HEAD commit 6df7b77b5ccf681694097a111ee0525f7bf1350f

jjsjann123 commented 2 years ago

So this is one of the view issues (as Kevin pointed out in team). Naoya tried Christian's fix with view (I guess #1535 ) but seems to not help with this case.

At this time, Naoya is looking into the issue. I'm stamping his name on this one for now

naoyam commented 2 years ago

@jjsjann123 Is this still an issue? We've made a couple of improvements so far, including more robust handling of views and trivial reductions. As far as I remember, @rdspring1 said he's working on a scheduler fix, but I'm not sure what the current status is.