EfficientZero doesn't seem to be training

SergioArnaud commented 2 years ago

Hi, first of all congratulations on the great work!

I haven't managed to train an agent yet using the EfficientZero framework. The command I'm using to train is the following:

python3 main.py  --env BreakoutNoFrameskip-v4 
                 --case atari 
                 --opr train 
                 --amp_type torch_amp 
                 --num_gpus 4 
                 --num_cpus 32 
                 --cpu_actor 12 
                 --gpu_actor 28 
                 --force 
                 --use_priority 
                 --use_max_priority 
                 --debug

In a cluster with the following architecture:

32 CPUs, each with 8 GB ram.

4 16GB teslaV100 gpus:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1B.0 Off |                    0 |
| N/A   32C    P0    52W / 300W |  12836MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:00:1C.0 Off |                    0 |
| N/A   31C    P0    51W / 300W |  11373MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:00:1D.0 Off |                    0 |
| N/A   31C    P0    54W / 300W |  10004MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   33C    P0    55W / 300W |   8529MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

The problem I'm facing is that even after a while of training there's only the following log:

(pid=52926) A.L.E: Arcade Learning Environment (version +978d2ce)
(pid=52926) [Powered by Stella]
(pid=52926) Start evaluation at step 0.
(pid=52926) Step 0, test scores: 
(pid=52926) [5. 0. 5. 2. 0. 2. 0. 9. 0. 0. 0. 2. 2. 4. 0. 2. 0. 0. 0. 0. 2. 2. 0. 5.
(pid=52926)  0. 0. 0. 5. 2. 0. 2. 5.]

Also the results folder of the experiment is mostly empty, I only have a train.log with the initial parameters.

I'm not sure if this is just a matter of waiting for a long time or If something in the inner workings is stuck (it looks like the batch_storage from the main train loop is always empty since we haven't entered into the train phase yet.

Something I think is really weird is that time passes but the GPU Memory-Usage stays exactly the same which makes me think something is off.

Would appreciate any advice in order to make this work. Thanks in advance!

SergioArnaud commented 2 years ago

I managed to make EfficientZero train

It's more important than I would've thought the balance between GPUs, CPUs and GPUactors, CPUactors. In this case the problem was that when I was using 28 gpu actors they got full and nothing else could be added causing an infinite loop in some while statement.

I just changed the gpu_actor to 24 and now EfficientZero is training without a problem.

I also upgraded to ray==1.9, it helped me a lot with the debugging.

SergioArnaud commented 2 years ago

In general seems like I can train 1 out of several experiments I try, the distributed nature of the agent seems to cause a really unstable training experience.

Do you have any recomendations in order to replicate the runs from the paper?

YeWR / EfficientZero

EfficientZero doesn't seem to be training #10