NVIDIA-AI-IOT / trt_pose

Real-time pose estimation accelerated with NVIDIA TensorRT
MIT License
982 stars 293 forks source link

Models with higher input size #86

Open lweicker opened 4 years ago

lweicker commented 4 years ago

I'm really impressed with the results obtained so far. I'm wondering if you plan to train and publish models but with higher input size? I ask that because I see in tasks/human_pose/experiments/ that you experimented on higher input image sizes like 368x368, 384x384.

Azzedine-Touazi commented 3 years ago

Plz, how can we use TRT-pose for high resolution data (e.g. 1920 x 1080 images)?

Thanks in advance.

jaybdub commented 3 years ago

Hi all,

We did a few experiments with higher resolutions, but for our primary use case (pose detection within a few meters using an Raspberry pi camera), we didn't find much qualitative improvements, and the runtime increased. If you want to detect objects within a few meters using a typical camera (say 50+degree FOV), I'd recommend just downscaling the image before providing to the neural network.

That said, you can run the existing pre-trained models at higher resolution, but it will change the effective size range of objects you detect. To do this, you need to adjust the size of the input data you provide to the model when optimizing with TensorRT. This has not been thoroughly tested, so your results may vary.

Please let me know if this helps or you have further questions.

Best, John

lweicker commented 3 years ago

Hi John,

Thanks for your answer. The inference with higher input size indeed works as you explained. After running multiple tests, I however have the feeling that the quality of the predictions is not as good as resizing first to either 224224 or 256256 (depending on the model used) and then infer.

It also led me to an other question. During the training, how did you resize the images? Did you squish them or cropped them?

Best,

Lionel