apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.68k stars 3.45k forks source link

[Bug] batch_norm give a inconsistent inference results between PyTorch and TVM #15280

Open jikechao opened 1 year ago

jikechao commented 1 year ago

If the attribute training in the operator batch_norm was assigned True as a value, TVM gave a inconsistent predict results in the same input.

Actual behavior

image

Steps to reproduce

import torch
from tvm import relay
import tvm
import numpy as np
from torch.nn import Module

input_data = torch.randn([2, 3, 4, 4, 4], dtype=torch.float32)
para_1 = torch.randn([3], dtype=torch.float32)
para_2 = torch.randn([3], dtype=torch.float32)
para_3 = torch.randn([3], dtype=torch.float32)
para_4 = torch.randn([3], dtype=torch.float32)

class batch_norm(Module):
    def forward(self, *args):
        return torch.nn.functional.batch_norm(args[0], para_1,para_2,para_3,para_4,training=True)

m =batch_norm().float().eval()
torch_outputs = m(input_data)

trace = torch.jit.trace(m, input_data)
input_shapes = [('input0', torch.Size([2, 3, 4, 4, 4]))]

mod, params = relay.frontend.from_pytorch(trace, input_shapes)
with tvm.transform.PassContext(opt_level=3):
    exe = relay.create_executor('graph', mod=mod, params=params, device=tvm.device('llvm', 0), target='llvm').evaluate()

input_tvm = {'input0': np.array(input_data, dtype='float32')}
tvm_outputs = exe(**input_tvm).asnumpy()

np.testing.assert_allclose(torch_outputs, tvm_outputs, rtol=1e-3, atol=1e-3)

Triage

zyc-bit commented 9 months ago

batch_norm doesn't have a cuda schedule, why? after I use relay.transform.FuseOps on relay, it will get error:

batch_norm is not optimized for this platform.
……
raise RuntimeError(f"schedule not registered for '{target}'")
RuntimeError: schedule not registered for 'cuda -keys=cuda,gpu -arch=sm_80 - max_num_threads=1024 -thread_warp_size=32'