apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.84k stars 3.48k forks source link

[Bug] Running test_conv2d.py got error output in Ethos-N78 platform. #13191

Closed kychao3074 closed 2 years ago

kychao3074 commented 2 years ago

Expected behavior

Running test_conv2d.py can get correct output in Ethos-N78 platform.

Actual behavior

Running test_conv2d.py got error output in Ethos-N78 platform.

Environment

Ubuntu 20.04.4 LTS GCC 9.4.0 TVM 0.10.dev0

Steps to reproduce

~/tvm/tests/python/contrib/test_ethosn$ vi test_conv2d.py ... print("shape:{}, out_channels:{}, kernel_h:{}, kernel_w:{}".format(shape, out_channels, kernel_h, kernel_w)) print("pad:{}, stride:{}, dilation:{}, qnn_per_channel:{}".format(pad, stride, dilation, qnn_per_channel)) ... if name == "main": test_conv2d("uint8", False)

~/tvm/tests/python/contrib/test_ethosn$ python3 test_conv2d.py shape:(1, 17, 20, 26), out_channels:4, kernel_h:3, kernel_w:1 pad:attr, stride:(2, 2), dilation:(1, 1), qnn_per_channel:False /home/user/tvm/python/tvm/driver/build_module.py:266: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead. warnings.warn( conv2d NHWC layout is not optimized for x86 with autotvm. conv2d NHWC layout is not optimized for x86 with autotvm. Traceback (most recent call last): File "test_conv2d.py", line 354, in test_conv2d("uint8", False) File "test_conv2d.py", line 208, in test_conv2d tei.verify(outputs, dtype, 1) File "/home/user/tvm/tests/python/contrib/test_ethosn/infrastructure.py", line 244, in verify tvm.testing.assert_allclose(outs[0].numpy(), outs[1].numpy(), rtol=rtol, atol=atol) File "/home/user/tvm/python/tvm/testing/utils.py", line 119, in assert_allclose np.testing.assert_allclose(actual, desired, rtol=rtol, atol=atol, verbose=True) File "/home/user/.local/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1527, in assert_allclose assert_array_compare(compare, actual, desired, err_msg=str(err_msg), File "/home/user/.local/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=1e-07, atol=1

Mismatched elements: 360 / 360 (100%) Max absolute difference: 251 Max relative difference: 110. x: array([[[[ 97, 93, 94, 96], [103, 101, 100, 98], [100, 103, 100, 101],... y: array([[[[ 56, 178, 163, 1], [ 0, 0, 0, 0], [144, 190, 140, 1],...

Triage

byoc:ethosn

lhutton1 commented 2 years ago

Hi @kychao3074, the output mismatch is a bit deceiving here, it looks as though the inference is actually failing to run on the NPU. Because of this, the runtime returns random values from memory. This should be improved by #13022 (when it is merged) which adds error messages to the runtime, rather than a silent failure.

I suspect the tests are being compiled for the wrong NPU variant. You can change the default variant via an environment variable, see: https://github.com/apache/tvm/blob/main/tests/scripts/task_python_ethosn_tests.sh#L30.

(Note that I'm hoping to improve this interface to use the same target string that would be expected by user-facing code here: https://github.com/apache/tvm/pull/13159/files#diff-3db627798cc5ccaa213c7e6773dd76906fc1cc2f861320d5f40e9f9b0fe52650R32, so this method may change in the future)

kychao3074 commented 2 years ago

Hi @lhutton1. Because our Ethos-N78 platform currently uses ethos-n-driver-stack version 22.05, so we just can use TVM version on between 2022/08/02 and 2022/09/28. I will test your #13022 when we update ethos-n-driver-stack version as 22.08.

I saw the get_ethosn_variant() has default ETHOSN_VARIANT_CONFIG as Ethos-N78_1TOPS_2PLE_RATIO in tvm/tests/python/contrib/test_ethosn/infrastructure.py. How can I know which ETHOSN_VARIANT_CONFIG should be used in our Ethos-N78 platform? Or are there other ETHOSN_VARIANT_CONFIG options can let me to set in our Ethos-N78 platform?

Thanks very much for your help!

kychao3074 commented 2 years ago

Hi @lhutton1. We set ETHOSN_VARIANT_CONFIG as Ethos-N78_4TOPS_4PLE_RATIO, then now running test_conv2d.py can get correct output in our Ethos-N78 platform. This bug has got solved. Thanks very much for your help!

lhutton1 commented 2 years ago

No problem, great to hear! I'll close this issue, feel free to open another if you have any more questions

kychao3074 commented 2 years ago

OK! Thank you so much!