HRNet / Lite-HRNet

This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.
Apache License 2.0
820 stars 126 forks source link

Inference time is suprisingly long #35

Open huilongan opened 3 years ago

huilongan commented 3 years ago

Test on V100 GPU card, with top-level CPU. For litehrnet_18_coco_256x192, the inference latency > 200 ms

GentleTel commented 3 years ago

can you speak chinese? i meet the same question as you!

WingsOfPanda commented 3 years ago

any update on this issue? I tasted on A10 card and the inference latency is pretty long as well...

GentleTel commented 3 years ago

no, i think the main reason is cause taht the Communication bottleneck

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年6月7日(星期一) 中午12:45 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [HRNet/Lite-HRNet] Inference time is suprisingly long (#35)

any update on this issue? I tasted on A10 card and the inference latency is pretty long as well...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

IanUJo commented 3 years ago

do you use inference_top_down_pose_model()? from mmpose.apis.inference import _inference_single_pose_model in this method, changing the device of tensor to cuda slows the speed down a bit. Also, some more speed down occurs in the test_pipeline(data) part of this method.

welleast commented 3 years ago

This is a good point. PyTorch does not support multi-branch structure well. The inference time is a little long. With careful implementation at CPU, the runtime acceleration is also the same as at in FLOPs. In our product: the theoretic acceleration is 3.7, the runtime acceleration is ~3.5.

iiou16 commented 3 years ago

@welleast Could you tell me the main points of "careful implementation"? And, if the multi-branch structure is a problem, will HRNet be just as fast with a similar implementation?