apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.6k stars 3.44k forks source link

tflite==2.4.2 breaks "test_forward.py::test_forward_mobilenet_v1" (and few other tests in the tflite frontend) #8339

Closed manupak closed 2 years ago

manupak commented 3 years ago

With recent docker environment update to #8306, the tflite version is upgraded to 2.4.2.

I'm raising this, as when we re-generate Docker images, this will certainly show up. The error messages I see are as follows :

[2021-06-25T10:33:33.284Z] =========================== short test summary info ============================
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_mobilenet_v1
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_mobilenet_v2
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_mobilenet_v3
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_sparse_mobilenet_v1
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_sparse_mobilenet_v2
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_inception_v3_net
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_inception_v4_net
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_inception_v4_net_batched
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_qnn_inception_v1_net
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_qnn_mobilenet_v1_net
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_qnn_mobilenet_v2_net
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_tflite2_qnn_resnet50
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_tflite2_qnn_inception_v1
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_tflite2_qnn_mobilenet_v2
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_coco_ssd_mobilenet_v1
[2021-06-25T10:33:33.284Z] FAILED tests/python/frontend/tflite/test_forward.py::test_forward_mediapipe_hand_landmark

A specific error message from the above list will be as follows :

[2021-06-25T10:33:33.284Z] tests/python/frontend/tflite/test_forward.py:188: in run_tvm_graph
[2021-06-25T10:33:33.284Z]     tflite_model, shape_dict=shape_dict, dtype_dict=dtype_dict
[2021-06-25T10:33:33.284Z] python/tvm/relay/frontend/tflite.py:3678: in from_tflite
[2021-06-25T10:33:33.284Z]     op_converter.convert_op_to_relay()
[2021-06-25T10:33:33.284Z] python/tvm/relay/frontend/tflite.py:231: in convert_op_to_relay
[2021-06-25T10:33:33.284Z]     ret = self.convert_map[op_code_str](op)
[2021-06-25T10:33:33.284Z] python/tvm/relay/frontend/tflite.py:1297: in convert_add
[2021-06-25T10:33:33.284Z]     return self._convert_elemwise(_op.add, op)
[2021-06-25T10:33:33.284Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2021-06-25T10:33:33.284Z] 
[2021-06-25T10:33:33.284Z] self = <tvm.relay.frontend.tflite.OperatorConverter object at 0x7fd4942d4d30>
[2021-06-25T10:33:33.284Z] relay_op = <function add at 0x7fd4fae05bf8>
[2021-06-25T10:33:33.284Z] op = <tflite.Operator.Operator object at 0x7fd49439c528>
[2021-06-25T10:33:33.284Z] ignore_qnn_params = False
[2021-06-25T10:33:33.284Z] 
[2021-06-25T10:33:33.284Z]     def _convert_elemwise(self, relay_op, op, ignore_qnn_params=False):
[2021-06-25T10:33:33.284Z]         """Generic method to Convert TFLite elemwise"""
[2021-06-25T10:33:33.284Z]         try:
[2021-06-25T10:33:33.284Z]             from tflite.AddOptions import AddOptions
[2021-06-25T10:33:33.284Z]             from tflite.SubOptions import SubOptions
[2021-06-25T10:33:33.284Z]             from tflite.MulOptions import MulOptions
[2021-06-25T10:33:33.284Z]             from tflite.DivOptions import DivOptions
[2021-06-25T10:33:33.284Z]             from tflite.BuiltinOptions import BuiltinOptions
[2021-06-25T10:33:33.284Z]         except ImportError:
[2021-06-25T10:33:33.284Z]             raise ImportError("The tflite package must be installed")
[2021-06-25T10:33:33.284Z]     
[2021-06-25T10:33:33.284Z]         input_tensors = self.get_input_tensors(op)
[2021-06-25T10:33:33.284Z] >       assert len(input_tensors) == 2, "input tensors length should be 2"
[2021-06-25T10:33:33.284Z] E       AssertionError: input tensors length should be 2

Therefore, I think we should stick with tflite=2.3.1 unless we include the changes to the tflite frontend / tests to make the CI green once the docker images are regenerated.

manupak commented 3 years ago

cc : @leandron @areusch @Lunderberg @mbrookhart

leandron commented 3 years ago

@Lunderberg @mbrookhart any idea on whether this is something trivial to fix? Otherwise I would suggest we revert #8306 while this is investigated.

mbrookhart commented 3 years ago

:( I would probably hit this today as I get closer to finally being able to build the images.

I think the issue is that we updated docker to cuda 11, and the older version of TF isn't compatible with cuda11. I'll poke and see if I can figure out what's going on.

Lunderberg commented 3 years ago

We might be able to still use the older version of TF, since I think it falls back to using the CPU implementation, but would add to the already long time to runtime of the CI. It also wouldn't be as direct of a comparison, since it would be comparing TVM+cuda to tflite+CPU.

I'd lean toward improving compatibility with TF 2.4.2, but haven't looked at that portion of TVM to know how difficult that would be.

mbrookhart commented 3 years ago

Does anyone know where an API reference for TFLite is? I'm not super familiar with this frontend, but this does seem to be tflite returning the wrong number of inputs. I'm kind of wondering if they merged "ADD" and "ADD_N"

manupak commented 3 years ago

Can we merge this reasonably sooner ?

https://github.com/apache/tvm/pull/8375

@mbrookhart @Lunderberg @leandron @u99127

leandron commented 3 years ago

8375 is now merged, so perhaps we can close this issue?

areusch commented 3 years ago

i think we should wait til #8177 is done, just to make sure the fix will stick.

u99127 commented 3 years ago

I thinks this can now be closed ?

manupak commented 2 years ago

I think this is solved now.