OAID / Tengine

Tengine is a lite, high performance, modular inference engine for embedded device
Apache License 2.0
4.65k stars 998 forks source link

Prerun multithread graph failed (TIMVX) #702

Open pl561 opened 3 years ago

pl561 commented 3 years ago

Hello, I transcoded a PyTorch HRNet model to uint8 and obtained the following output in the prerun_graph_multithread function. The source code is similar to tm_unet.cpp adapted for uint8 by loading the VSI Device.

I successfully used the transcoding steps: pth -> onnx -> tmfile fp32 -> tmfile uint8 with the latest conversion and quantization tools provided by this repository.

Detailed output:

tengine-lite library version: 1.4-dev
--> add_context_device VSI DEVICE success.
--> Create graph success.
--> Get input tensor success.
--> Set input tensor shape success.
--> Set input tensor buffer success.
Prerun multithread graph:
Tensor: Tensor_name(stage2.0.fuse_layers.0.1.2.weight) tensor_index(97) tensor_data_type(0) .
Tensor: Tensor_name(stage3.0.fuse_layers.0.1.2.weight) tensor_index(178) tensor_data_type(0) .
Tensor: Tensor_name(stage3.0.fuse_layers.0.2.2.weight) tensor_index(184) tensor_data_type(0) .
W [vsi_nn_SortGraphNode:1336]Unprocessed node 65
W [vsi_nn_SetupGraph:626]Sort graph nodes failure.
Tengine Fatal: Pre-run subgraph(2) on TIMVX failed.
Tengine: Scheduler(sync) prerun failed.
Prerun multithread graph failed.

About Tensor: Tensor_name(stage2.0.fuse_layers.0.1.2.weight) tensor_index(97) tensor_data_type(0): After some debugging, I found out that in timvx_executor.cc: VXEngine::VXTensorMap line 74 ir_tensor->data_type is not recognized as tim::vx::DataType::UINT8. Shouldn't these weights be quantized (?) to uint8? Why does this happen?

Then, for W [vsi_nn_SortGraphNode:1336]Unprocessed node 65 error, how to understand this error message better?

Looking forward to learn more and better, Thank you guys in advance for your precious help~~

kalcohol commented 3 years ago

it seems that ops in onnx model are set as opset 13, please reconvert your pth model using opset 11 and try again pls.

pl561 commented 3 years ago

Hello, thank you for your answer. I reproduced the same steps and made sure that the onnx model opset was set to 11. Unfortunately, I still obtained the same error.

I checked with 2 types of model initialization (0 and normal distribution N(0, 1) in case it has influence, both resulted in the same error output.

Thanks~

PS: I used Tengine at commit e6152e2a1e0fcd39701d5da19f17f33ef9823390 (Mon May 31 15:02:37 2021)

BUG1989 commented 3 years ago

Hi, hrnet has been supported, please try it again, we always open the hrnet model in model zoo.