csarofeen / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
26 stars 7 forks source link

`var_mean` fails when reduction occurs across all possible tensor dimensions #2486

Closed kevinstephano closed 1 year ago

kevinstephano commented 1 year ago

🐛 Describe the bug

This bug with var_mean is very specific to the case where the reductions occur across all dimensions of the input tensor. If you remove 1, it will be fine.

import torch
from nvfuser import FusionDefinition

inputs = [
    torch.randn(2, 2, 2, device='cuda'),
]

with FusionDefinition() as fd:
    t0 = fd.from_pytorch(inputs[0])
    t1,t2 = fd.ops.var_mean(t0, [0, 1, 2], 0)
    fd.add_output(t1)
    fd.add_output(t2)

out = fd.execute(inputs)

Error :

[W kernel_ir.cpp:532] Warning: Unroll required but not possible. Register allocation disabled. Loop index: i107 (function isUnrolled)
Traceback (most recent call last):
  File "/workspace/test.py", line 14, in <module>
    out = fd.execute(inputs)
  File "/opt/pytorch/pytorch/nvfuser/__init__.py", line 22, in execute
    return self._execute(inputs)
RuntimeError: false INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch-jit/third_party/nvfuser/csrc/executor.cpp":327, please report a bug to PyTorch. Allocations must be based on constant integers for local memory. However, found: T9_l[ iS61{( ceilDiv(( ceilDiv(( ceilDiv(( ceilDiv(( T0.size[0] * ( T0.size[1] * T0.size[2] ) ), 4) ), blockDim.x) ), 1) ), gridDim.x) )}, iblockIdx.x60{gridDim.x}, ithreadIdx.x57{blockDim.x}, iUS59{1}, iV55{4} ],  have dynamic allocations but are placed in local memory.

Versions

ToT

IvanYashchuk commented 1 year ago

Interesting, it seems to be tested in PyTorch: https://github.com/pytorch/pytorch/blob/c6d8d10b3e974019dae7ec91a85c6192c6d511fa/torch/testing/_internal/common_methods_invocations.py#L19823-L19826 Maybe it was working in the previous version of nvFuser.

naoyam commented 1 year ago

Yes, this was a relatively recent bug.