[pp-humanseg-lite] How to convert dynamic output shape to static output shape?

PaddlePaddle / Paddle2ONNX

ONNX Model Exporter for PaddlePaddle

Apache License 2.0

726 stars 172 forks source link

[pp-humanseg-lite] How to convert dynamic output shape to static output shape? #738

Closed jaehwlee closed 2 years ago

jaehwlee commented 2 years ago

Please fill in the information below so that we can solve the problem quickly, Thanks !

Describe the bug A clear and concise description of what the bug is. Hello, I'm going to use the Paddle Human-Seg model in the form of onnx that you uploaded on model_zoo branch. Unity requires static input shape to use the onnx model, so I made it (1, 3, 224, 398) through onnx_simplifier. However, the output shape is still dynamic (-1, 2, -1, -1), so it does not work on Unity. How can I make output shape into static (1, 2, 224, 398) form? The model visualization results are as follows.

Informations (please complete the following information):

Inference engine for deployment: Windows 10
Why convert to onnx：For using on Unity
Paddle2ONNX Version: model zoo branch
Email/Wechat/Phone: jaehwlee@gmail.com

Screenshots paddle_portrait_224_398

Additional context

jiangjiajun commented 2 years ago

Try to use this tool https://github.com/jiangjiajun/PaddleUtils/tree/main/onnx

python onnx_infer_shape.py --input model.onnx --output new_model.onnx

jaehwlee commented 2 years ago

It works! Thank you so much.

dganzella commented 1 year ago

@jaehwlee do you happen to have the working converted model around? I can't seem to get the same results. Even after runnign the script I get this error:

jaehwlee commented 1 year ago

@dganzella I'm sorry for the late reply. Did you use Barracuda for your inference engine? If you use Barracuda, there are layers that are not implemented, so you may need to modify the model at the code level.

I avoided errors like this by using NatML as an inference engine. Personally, I had a lot of advantages over Barracuda, so I recommend you try this

girish-d commented 1 year ago

@jaehwlee I'm also trying to do get the PP_HumanSeg model (ppseg_lite_portrait_398x224_with_softmax.onnx) into Unity.

But I can't seem to convert the dynamic input shape to static using onnxsim (I used "--overwrite-input-shape "x:1,3,224,398"). It fails with "[ShapeInferenceError] Inferred shape and existing shape differ in dimension 0: (1) vs (-1)"

Would it be possible for you to list the exact steps you followed to get the "ppseg_lite_portrait_398x224_with_softmax.onnx" into Unity? I'm okay with using NatML or Barracuda.

jaehwlee commented 1 year ago

@girish-d Hi. I'm sorry,I can't remember because it's been a while since I imported that model. But what is certain is that I changed it to static input through the tool below, and I was able to import the model through NatML right away.

PaddleUtils: https://github.com/jiangjiajun/PaddleUtils/tree/main/onnx NatML: https://github.com/natmlx/natml-unity

If you don't care about license about image matting model, try NatML's RVM model. It has a state of the art performance and easy to use. natml-rvm: https://github.com/natmlx/robust-video-matting-unity

girish-d commented 1 year ago

Thanks for the inputs @jaehwlee. I had a look at the RVM model earlier but GPL3 will not be usable for me, unfortunately.

I did manage to convert to static inputs using ONNXRuntime though: https://onnxruntime.ai/docs/tutorials/mobile/helpers/make-dynamic-shape-fixed.html

jaehwlee commented 1 year ago

@girish-d Oh I see. That's why I implemented and trained RVM myself.

Then have you used Google's mediapipe? It's been updated again, so the performance keeps improving. Also, I would like to recommend this the most because it is the fastest and there is no license issue. NatML support it, but recently mediapipe was updated. So I recommend you to download mediapipe model directly, and import it with NatML.

NatML mediapipe: https://github.com/natmlx/meet-unity mediapipe image matting: https://developers.google.com/mediapipe/solutions/vision/image_segmenter

girish-d commented 1 year ago

@jaehwlee Thanks for the links. Yes, I have used the Selfie Segmentation model from Mediapipe, but I found the hand tracking to be much better with the PP_HumanSeg model.