VeriSilicon / tflite-vx-delegate

Tensorflow Lite external delegate based on TIM-VX
MIT License
42 stars 23 forks source link

VX Delegate produces wrong output #35

Closed michaelnguyen11 closed 2 years ago

michaelnguyen11 commented 2 years ago

Dear Supporters,

I'm using i.MX 8M Plus EVK board with BSP 5_10_52-2_1_0 which use TIM-VX 1.1.32 version (BSP Yocto build cloned TIM-VX at here : https://github.com/NXPmicro/tim-vx-imx)

When I use VX Delegate to delegate YoloV3 TFLIte model , it produced wrong output. However with TFLIte NNAPI Delegate, the model works correctly.

For example : with same model, same pre-process and post-process.

With VX Delegate, the FPS is 3 times faster than NNAPI Delegate, so it would be great if the VX Delegate could produce correctly result.

Did you meet this kind of issue before ? Could you guide me fix this issue ?

Many thanks in advance !

Regards, Hiep

sunshinemyson commented 2 years ago

@michaelnguyen11 would you mind sharing your model?

michaelnguyen11 commented 2 years ago

Hi @sunshinemyson ,

Here is my model : https://drive.google.com/file/d/18MrSmLv1T5rKEknBj2C5In6jmZ7bhAuW/view?usp=sharing . You can refer to this repository to do post-processing : https://github.com/david8862/keras-YOLOv3-model-set/tree/master/inference/tflite

michaelnguyen11 commented 2 years ago

Hi @sunshinemyson ,

I've install TIM-VX latest code with i.MX BSP 5_10_52-2_1_0, but the result is still the same : with VX Delegate the model can't detect object, in spite of the NN API can.

sunshinemyson commented 2 years ago

@michaelnguyen11 ,

We observe similar issue as you. The team is checking it right now.

sunshinemyson commented 2 years ago

@michaelnguyen11 , after check the model, we found lots of dequantize op with constant inputs. We are not supported constant folding so far, such constant input cannot handle properly. Could you try to refine the models without such dequantize operation?

michaelnguyen11 commented 2 years ago

Hi @sunshinemyson ,

Sorry for late response.

I refine the models, so now the model contains only 1 quantize op at input and 1 dequantize op at output.

However, the result is still the same : with VX Delegate the model can't detect object, in spite of the NN API can.

Please help to check. Thanks in advance !!!

https://drive.google.com/drive/u/0/folders/1RVIRvLA7FN8iLfHAFqJiRedJYBLTdbk3

bkovalenkocomp commented 2 years ago

Hi, could be related ?

https://github.com/VeriSilicon/TIM-VX/issues/226

sunshinemyson commented 2 years ago

@bkovalenkocomp ,

I didn't see any constant input for the graph in your model. You can try layer dump for debugging

https://github.com/VeriSilicon/TIM-VX/issues#issuecomment-986138283

sunshinemyson commented 2 years ago

Hi @sunshinemyson ,

Sorry for late response.

I refine the models, so now the model contains only 1 quantize op at input and 1 dequantize op at output.

However, the result is still the same : with VX Delegate the model can't detect object, in spite of the NN API can.

Please help to check. Thanks in advance !!!

https://drive.google.com/drive/u/0/folders/1RVIRvLA7FN8iLfHAFqJiRedJYBLTdbk3

Yes, We will check it asap.

dmartinez-quercus commented 2 years ago

Hi @michaelnguyen11

I'm not 100% sure but I might know what is going on.

I'm having a similar issue with my YOLOv4-Tiny model which has two output tensors. I've quantized it and deployed to the i.MX8MPlus.

I can tell you that, when using this delegate, one of the output tensors does not deliver consistent results. The other output tensor gives good results. In other words, the model only detects "big objects" and misses the small ones when using the NPU.

I've analyzed the model and I found out there is a "Resize"/"Upsample" layer that is not working properly. This layer is located in parallel after the first output which is the one that delivers valid results. I believe other YOLO models follow this pattern.

In my case, the Resize/Upsample layer simple transforms a 1x13x13x128 tensor to a 1x26x26x128 tensor. For some reason, this delegate is converting this layer into two sequential Deconvolution layers (1x13x13x128 to 1x13x13x512 to 1x26x26x128). I suspect this is causing this issue.

I dumped the profiling info of this two layers: vx_debug.txt

Unfortunately, I have no idea how to fix this.

Thanks.

bkovalenkocomp commented 2 years ago

Hi @michaelnguyen11

I'm not 100% sure but I might know what is going on.

I'm having a similar issue with my YOLOv4-Tiny model which has two output tensors. I've quantized it and deployed to the > i.MX8MPlus.

I can tell you that, when using this delegate, one of the output tensors does not deliver consistent results. The other output tensor gives good results. In other words, the model only detects "big objects" and misses the small ones when using the NPU.

I've analyzed the model and I found out there is a "Resize"/"Upsample" layer that is not working properly. This layer is located in parallel after the first output which is the one that delivers valid results. I believe other YOLO models follow this pattern.

In my case, the Resize/Upsample layer simple transforms a 1x13x13x128 tensor to a 1x26x26x128 tensor. For some reason, this delegate is converting this layer into two sequential Deconvolution layers (1x13x13x128 to 1x13x13x512 to 1x26x26x128). I suspect this is causing this issue.

I dumped the profiling info of this two layers: vx_debug.txt

Unfortunately, I have no idea how to fix this.

Thanks.

thats interesting, I have upsample layers in my model too

https://github.com/VeriSilicon/TIM-VX/issues/226

liyuenan2333 commented 2 years ago

Hi,@dmartinez-quercus @bkovalenkocomp

The vx-delegate will transform resize to deconvolution in some cases, you guys can turn off the feature and try it again.

the trigger is at https://github.com/VeriSilicon/tflite-vx-delegate/blob/c862e75266bcccae1fdd2d6b91c9017c5d04a918/op_map.cc#L1020. Set it to false.

I hope it can solve your problem.

dmartinez-quercus commented 2 years ago

Hi,@dmartinez-quercus @bkovalenkocomp

The vx-delegate will transform resize to deconvolution in some cases, you guys can turn off the feature and try it again.

the trigger is at

https://github.com/VeriSilicon/tflite-vx-delegate/blob/c862e75266bcccae1fdd2d6b91c9017c5d04a918/op_map.cc#L1020 . Set it to false.

I hope it can solve your problem.

Hi @liyuenan2333

I've tried what you suggested. The vx-delegate now interprets this layer as "resize" instead of two "deconvolution" layers.

I'm seeing an important improvement. Smaller objects are now detected the same way NNAPI does. Performance is good too.

However, now VIV_VX_DEBUG_LEVEL is printing the following message: Kernel "com.vivantecorp.extension.evis.resize_nearest_U8toU8_op" does not exist

Not sure if this is relevant but so far it's working better.

Thank you very much.

liyuenan2333 commented 2 years ago

@dmartinez-quercus You dont't have to care about this log, it doesn't matter at all.

michaelnguyen11 commented 2 years ago

Hi @dmartinez-quercus , @liyuenan2333 ,

Sorry for late response, I've just back from vacation.

I changed bool can_resize_to_transposeconv = false, updated TIM-VX to latest commit, which includes #250 MR.

The VX Delegate can only detect very large object now, but can't detect small object, compare to NNPAI. For example: 1/ with NNAPI Delegate: yolov3_street_nnapi 2/ With VX Delegate: yolov3_street_vx

The VX Delegate works better but I think the problem has not completely solved yet.

dmartinez-quercus commented 2 years ago

Hi @dmartinez-quercus , @liyuenan2333 ,

Sorry for late response, I've just back from vacation.

I changed bool can_resize_to_transposeconv = false, updated TIM-VX to latest commit, which includes #250 MR.

The VX Delegate can only detect very large object now, but can't detect small object, compare to NNPAI. For example: 1/ with NNAPI Delegate: yolov3_street_nnapi 2/ With VX Delegate: yolov3_street_vx

The VX Delegate works better but I think the problem has not completely solved yet.

Hmmm.. I did not try the latest commit. I just simply set bool can_resize_to_transposeconv = false in the TIM-VX version I already had (early Dec 21) and it worked for the YOLOv4-Tiny (2 outputs). However, you may be probably using a different YOLO model. If so, you might need to set the VX debugging/profiling environmental variables before running the tests in order to carefully check for any layer inconsistency when TIM-VX builds the model graph like I did.

bkovalenkocomp commented 2 years ago

Just for completeness, Ill mention my findings here:

Test for my model (INT8 graph): 2 faces on the image - big one and small one.

on x86 probability scores were: 0.99 for big one and 0.98 for small one on a311d npu scores are 0.74 for big one and 0.98 for small one

Other outputs look fine (landmarks, features), but the difference in the scores is suspicions. Maybe there is a bug in the softmax layer?

update: for my case changing can_resize_to_transposeconv makes no difference, I bet on softmax bug ;-)

sunshinemyson commented 2 years ago

Softmax bug has been fixed in TIM-VX