RunzheYang / MORL

Multi-Objective Reinforcement Learning
248 stars 49 forks source link

Agents are stuck by the green pilar #7

Open MarvinMo opened 4 years ago

MarvinMo commented 4 years ago

I'm trying to run 'run_e2c_double.py'. And I found that all of the agents were stuck by the same green pilar. Any idea about this problem? I have run the code for about 12 hours. Is it possible that I need to spend more time training the net? Or it is caused by the instability of the training process? supermario

RunzheYang commented 4 years ago

We trained it for more than 4.5k episodes and it took about 1d20h on our machine.

Screen Shot 2019-12-17 at 3 00 06 AM

And how did you test your model above? You may need to specify a preference during the test or infer the underlying preference. You can modify the script in multimario/scripts/_local/test.sh

You can also try one of our trained models: https://gofile.io/?c=hWzJPM

MarvinMo commented 4 years ago

I added '--render' to the end of python run_e3c_double.py --env-id SuperMarioBros-v2 --use-cuda --use-gae --life-done --single-stage --training --standardization --num-worker 16 --sample-size 8 --beta 0.05 --name e3c_b05. Is it correct? And I notice that there are 16 agents been trained simultaneously. You mentioned that you trained it for 4.5k episodes. Did you mean each of the agents was trained for 4.5k episodes or they were trained for 4.5 episodes in all?

RunzheYang commented 4 years ago

Yes, you can append --render if you want to monitor the training process, but it would slower the training. 4.5k episodes is the number of interactions of agent 0. The total number of episodes should be ~16*4.5k=72k, since all agents are trained asynchronously.

MarvinMo commented 4 years ago

OK. I'll try it again. Thank you for your patience to answer my questions.

MarvinMo commented 4 years ago

I have tried several times. I found that the training speed is really slow. The step number for each episode is greater than 1k instead of 300 in your image. Have you met this problem? image

Then I add a logical judgment that once the agent is stuck for more than 30 steps, the environment will be restarted. Although this increased training speed, the agent is always stuck at 594m. image

RunzheYang commented 4 years ago

It is weird that all agents always stuck at the same position, since with exploration the agent should easily go further than 594m, unless there is a problem with the environment. What do you see when you try run_n3c.py or run_a3c.py? We never saw this in our previous experiments and may need more information to debug. And if the number of steps are too large, instead of restarting the environment, increasing the temperature --T might also be an option which encourages more exploration.