Open chenzx opened 3 years ago
After you instantiate the model, the first few inferences usually make take a bit longer. Is that what you are observing?
On Feb 4, 2021, at 2:36 AM, Chen Zhixiang notifications@github.com wrote:
I'm using gRPC to wrap YOLOv4 in a AI model serving python script, and deploy it in a nvidia-docker container instance which is in cloud with a tesla gpu device backend;
however, i find that the inference time is NOT consistent: normally it costs ~670ms to do detection, (VS: opencv-python CPU uses ~1.3s, darknet-CPU uses 13s, in the same docker instance), but if i constantly makes RPC calls from a test client, sometimes the inference time can improve to ~100ms.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
After you instantiate the model, the first few inferences usually make take a bit longer. Is that what you are observing? … On Feb 4, 2021, at 2:36 AM, Chen Zhixiang @.***> wrote: I'm using gRPC to wrap YOLOv4 in a AI model serving python script, and deploy it in a nvidia-docker container instance which is in cloud with a tesla gpu device backend; however, i find that the inference time is NOT consistent: normally it costs ~670ms to do detection, (VS: opencv-python CPU uses ~1.3s, darknet-CPU uses 13s, in the same docker instance), but if i constantly makes RPC calls from a test client, sometimes the inference time can improve to ~100ms. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
i observed perf fluctuations on inference time, not only the first calls, and doubt it's due to dynamic memory allocation behavious, but had no time to further investigate on it.
I'm using gRPC to wrap YOLOv4 in a AI model serving python script, and deploy it in a nvidia-docker container instance which is in cloud with a tesla gpu device backend;
however, i find that the inference time is NOT consistent: normally it costs ~670ms to do detection, (VS: opencv-python CPU uses ~1.3s, darknet-CPU uses 13s, in the same docker instance), but if i constantly makes RPC calls from a test client, sometimes the inference time can improve to ~100ms.