YeWR / EfficientZero

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
GNU General Public License v3.0
847 stars 131 forks source link

EfficientZero doesn't seem to be training #10

Closed SergioArnaud closed 2 years ago

SergioArnaud commented 2 years ago

Hi, first of all congratulations on the great work!

I haven't managed to train an agent yet using the EfficientZero framework. The command I'm using to train is the following:

python3 main.py  --env BreakoutNoFrameskip-v4 
                 --case atari 
                 --opr train 
                 --amp_type torch_amp 
                 --num_gpus 4 
                 --num_cpus 32 
                 --cpu_actor 12 
                 --gpu_actor 28 
                 --force 
                 --use_priority 
                 --use_max_priority 
                 --debug

In a cluster with the following architecture:

The problem I'm facing is that even after a while of training there's only the following log:

(pid=52926) A.L.E: Arcade Learning Environment (version +978d2ce)
(pid=52926) [Powered by Stella]
(pid=52926) Start evaluation at step 0.
(pid=52926) Step 0, test scores: 
(pid=52926) [5. 0. 5. 2. 0. 2. 0. 9. 0. 0. 0. 2. 2. 4. 0. 2. 0. 0. 0. 0. 2. 2. 0. 5.
(pid=52926)  0. 0. 0. 5. 2. 0. 2. 5.]

Also the results folder of the experiment is mostly empty, I only have a train.log with the initial parameters.

I'm not sure if this is just a matter of waiting for a long time or If something in the inner workings is stuck (it looks like the batch_storage from the main train loop is always empty since we haven't entered into the train phase yet.

Something I think is really weird is that time passes but the GPU Memory-Usage stays exactly the same which makes me think something is off.

Would appreciate any advice in order to make this work. Thanks in advance!

SergioArnaud commented 2 years ago

I managed to make EfficientZero train

It's more important than I would've thought the balance between GPUs, CPUs and GPUactors, CPUactors. In this case the problem was that when I was using 28 gpu actors they got full and nothing else could be added causing an infinite loop in some while statement.

I just changed the gpu_actor to 24 and now EfficientZero is training without a problem.

I also upgraded to ray==1.9, it helped me a lot with the debugging.

SergioArnaud commented 2 years ago

In general seems like I can train 1 out of several experiments I try, the distributed nature of the agent seems to cause a really unstable training experience.

Do you have any recomendations in order to replicate the runs from the paper?