Hi! When running your code, I monitored the size of prediction_queue. I find that every time the prediction_worker carries out its computation, the size of the queue is different. And the size can never reach the prediction_queue_size I set, which means the parallelism is far from enough. The queue size is usually very small. Do you have any ideas how to increase the parallelism so that every time the worker computes the result, the queue size is large enough to better utilize the GPU?
Hi! When running your code, I monitored the size of
prediction_queue
. I find that every time theprediction_worker
carries out its computation, the size of the queue is different. And the size can never reach theprediction_queue_size
I set, which means the parallelism is far from enough. The queue size is usually very small. Do you have any ideas how to increase the parallelism so that every time the worker computes the result, the queue size is large enough to better utilize the GPU?