YeWR / EfficientZero

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
GNU General Public License v3.0
847 stars 131 forks source link

Training is really slow #12

Open SergioArnaud opened 2 years ago

SergioArnaud commented 2 years ago

First of all, congratulations on the great work!

I've been trying to train an agent to play breakout and the training is really slow. This is really confusing to me since, according to the paper, it should take 7 hours to do a full training of 100k steps. My experience has been different:

Running time

Hardware:

Running command

python main.py --env atari 
                           --case BreakoutNoFrameskipv4 
                           --opr train 
                           --amp_type torch_amp 
                           --num_gpus 4 
                           --num_cpus 80 
                           --cpu_actor 5 
                           --gpu_actor 13 
                           --seed 2917 
                           --force 
                           --use_priority 
                           --use_max_priority 
                           --debug 
                           --p_mcts_num 1

Do you have any idea or advice so that we can optimize the runtime?

@YeWR

YeWR commented 2 years ago

It seems you could try more cpu and gpu actors, such as --cpu_actor 14 --gpu_actor 20. Since you have 4 RTX6000 and each RTX6000 has more than 20GB of memory, I think the original bash file train.sh is runnable on your machine.

Hope this can help you :)