Open ntcmp2u opened 1 year ago
Hi, will anyone take care of this bug?
Hi, How your model_0.onnx file was generated?
Hi @CookedMelon , I generated the model with a custom fuzzer. Bellow I provide a code snippet which does not depend on external model files and illustrate how the model is like. I think it will help developers locate which operator they optimize in a wrong way. Are you a developer of TVM? Will this bug be fixed?
import onnxruntime as ort
import onnx
import numpy as np
import pickle
from numpy import testing
import tvm
from tvm import relay
import torch
class Model0(torch.nn.Module):
def __init__(self):
super().__init__()
def forward(self, *args):
_args = args
getitem = _args[0]
tril = getitem.tril(0)
div = torch.div(tril, tril)
to = div.to(dtype = torch.int64)
getitem_1 = to[(slice(-13, -12, 1), slice(None, None, None))]
expand = getitem_1.expand(1, 25)
return (expand,)
model_0 = Model0()
output_names_0 = ['v4_0']
input_dict_0 = pickle.load(open('./0.pickle', 'rb'))
inputs_0 = tuple(torch.from_numpy(v).to('cpu') for _, v in input_dict_0.items())
torch.onnx.export(model_0, inputs_0, '0.onnx', verbose=False, input_names=['v5_0'], output_names=output_names_0, opset_version=14, do_constant_folding=False)
onnx_model_0 = onnx.load('0.onnx')
onnx_model_outputs_0 = [node.name for node in onnx_model_0.graph.output]
shape_dict_0 = {key: val.shape for key, val in input_dict_0.items()}
mod_0, params_0 = relay.frontend.from_onnx(onnx_model_0, shape_dict_0, freeze_params=True)
def func():
with tvm.transform.PassContext(opt_level=4):
executor_0 = relay.build_module.create_executor("graph", mod_0, tvm.cpu(), tvm.target.Target("llvm"), params_0).evaluate()
executor_res_0 = [executor_0(**input_dict_0).numpy()]
output_0 = dict(zip(onnx_model_outputs_0, executor_res_0))
return output_0
output_0 = func()
output_1 = func()
print('=========================')
try:
for tensor_name in output_names_0:
testing.assert_allclose(output_0[tensor_name], output_1[tensor_name])
print("no problem")
except AssertionError as e:
print("assertion failure for inconsistency")
print(e)
print('=========================')
I'm not a TVM developer. But I did find that some of the deep learning framework's APIs did not work correctly in TVM. I've posted issue about this phenomenon before. issue16016: compile tensorflow with tvm but get unexpected return value. In fact, I've found that there are a lot of APIs out there that do this
Yep. TVM still need more testing since it has some unstable (or incorrect) optimizations/transformations for models given by users.
BTW, I've raised this issue almost a month ago, it seems like the TVM team is understaffed, so they don't have time to handle it.
I use TVM to optimize an ONNX model but the output of the optmized model is inconsistent.
The poc code as the follow:
The model file and the test code are attached at report_bug.zip.
Expected behavior
Obviously,
output_0
andoutput_1
should be identical because they are the ouputs of the same model and the same inputs.Actual behavior
output_0
andoutput_1
are different and they appear to be unstable values during my several trials. In contrast, onnxruntime performs consistently each run. The stdout of my several trials are as follows:It worths to note that when setting
opt_level
to0
, the tvm perform consistently everytime. So I assume that this is introduced by the tvm optimization.Environment
OS: Ubuntu 22.04 LTS (Linux jin-pc 6.2.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jul 13 16:27:29 UTC 2 x86_64 x86_64 x86_64 GNU/Linux) TVM: 0.14.dev226
Steps to reproduce
unzip the attached zip file and use
python3 test.py
to execute the PoC.Triage