Open MarvinMo opened 4 years ago
We trained it for more than 4.5k episodes and it took about 1d20h on our machine.
And how did you test your model above? You may need to specify a preference during the test or infer the underlying preference. You can modify the script in multimario/scripts/_local/test.sh
You can also try one of our trained models: https://gofile.io/?c=hWzJPM
I added '--render' to the end of python run_e3c_double.py --env-id SuperMarioBros-v2 --use-cuda --use-gae --life-done --single-stage --training --standardization --num-worker 16 --sample-size 8 --beta 0.05 --name e3c_b05
. Is it correct?
And I notice that there are 16 agents been trained simultaneously. You mentioned that you trained it for 4.5k episodes. Did you mean each of the agents was trained for 4.5k episodes or they were trained for 4.5 episodes in all?
Yes, you can append --render
if you want to monitor the training process, but it would slower the training.
4.5k episodes is the number of interactions of agent 0. The total number of episodes should be ~16*4.5k=72k, since all agents are trained asynchronously.
OK. I'll try it again. Thank you for your patience to answer my questions.
I have tried several times. I found that the training speed is really slow. The step number for each episode is greater than 1k instead of 300 in your image. Have you met this problem?
Then I add a logical judgment that once the agent is stuck for more than 30 steps, the environment will be restarted. Although this increased training speed, the agent is always stuck at 594m.
It is weird that all agents always stuck at the same position, since with exploration the agent should easily go further than 594m, unless there is a problem with the environment. What do you see when you try run_n3c.py
or run_a3c.py
? We never saw this in our previous experiments and may need more information to debug. And if the number of steps are too large, instead of restarting the environment, increasing the temperature --T
might also be an option which encourages more exploration.
I'm trying to run 'run_e2c_double.py'. And I found that all of the agents were stuck by the same green pilar. Any idea about this problem? I have run the code for about 12 hours. Is it possible that I need to spend more time training the net? Or it is caused by the instability of the training process?