I have a 12 core PC with a 4090, and it seems like the code is bottlenecked on BatchWorker - all my CPUs are at 100%, while my GPU is mostly un-utilized. If I'm reading the code right BatchWorkers are doing roll outs with the trained model - is that right? Is this expected, or is there a trick to optimizing? I'm running scripts/train.sh.
Yeah, you are right. You can modify the number of Batchworkers and Dataworkers. But it maybe cost more time for training. If you wanna reduce the cost of CPU, you can reduce the dataworkers.
Firstly thanks for open sourcing your work.
I have a 12 core PC with a 4090, and it seems like the code is bottlenecked on BatchWorker - all my CPUs are at 100%, while my GPU is mostly un-utilized. If I'm reading the code right BatchWorkers are doing roll outs with the trained model - is that right? Is this expected, or is there a trick to optimizing? I'm running scripts/train.sh.