Closed Da-Capo closed 4 years ago
Hey Da-Capo, thanks for your report.
I think to start with a batch size of 4 isn't very large. The reason your CPUs are less busy for Polybeast is that the actor forward passes ("inference") happen on the GPU in that case. Options include:
I'm having similar issue on a ubuntu machine with 32cpu cores, and 4 V100 gpus. with monobeast, it only uses 1 gpu, and full cpu power, the frame rate is ~5000SPS; while with polybeast, I set batch_size=16, num_inference/learner_threads=8, but the frame rate is only ~300SPS, and only 2 gpus are running. Were you able to speed up polybeast? Can you share some insight with me? Thanks!
I build the cuda docker container like this, and tested mono and poly by almost same parameters below:
I got the result that polybeast is slower than monobeast: monobeast speed is about 10000SPS. polybeast speed is about 3000SPS. I have checked GPU, it works fine. monobeast used 100% of every CPU processor, but polybeast used only 50% of every CPU processor. How can I speed up the polybeast?