aqeelanwar / PEDRA

Programmable Engine for Drone Reinforcement Learning Applications
MIT License
266 stars 59 forks source link

Reading custom weights #24

Closed SumukhaNadig closed 4 years ago

SumukhaNadig commented 4 years ago

Hi, my training crashed after 30000 iterations or so and I saved the weights before quitting the program. Next time when I read the weights using the custom_path, it looks like the drone is starting all over again, as in the drone doesn't reach distances it did before I saved the weights. I just want to know if I'm doing it the right way. My folder structure is like this C:\Users\Sumukha\Desktop\Flight\DRL\PEDRA\models\trained\Indoor\indoor_long\Imagenet\e2e\drone0 and under drone0 I have drone0_user.data, index and meta. In my DeepQLearning.cfg, I have given the path as models/trained/Indoor/indoor_long/Imagenet/e2e/drone0/drone0_user. This will read all the 3 files right? Also I was wondering if there was any difference between giving the custom_load in config.cfg and DeepQLearning.cfg.

Is there a way you would recommend through which I'd know if the weights were actually loaded from custom path? In the console, it does display that it read the weights but the movement of the drone doesn't support it

aqeelanwar commented 4 years ago

Hello, There is a difference between saving the network weights and saving the entire state of the code. The code only saves the weights of the network when you hit 'Enter' key. It doesn't save the variables such as number of iterations, current epsilon, number of crashes, episodes etc.

When run the program again with custom_load, it does initialize the network with the custom weights. But the code is meant to start Q learning from iteration=0. The entire training begins again.

What do you mean by 'my training crashed'? The code does support recovering from environment crashes. If by 'my training crashed' you meant that your environment crashed while your python code was still running, you can recover from this crash by running the .exe file of the environment manually (unreal_envs\indoor_long\indoor_long.exe). Once the environment starts, you can go back to your PyGame screen and hit the key 'r'. This will attempt to reconnect with the environment. When your terminal displays that it has connected with the environment, you can uses the 'backspace' key to begin training from that point.

SumukhaNadig commented 4 years ago

Yes, the environment crashed and I did re launch the .exe file and pressed 'r' on the pygame screen but it wasn't able to reconnect. It gave me some timeout error and after that happened, I couldn't run the main.py again after until a reboot as I kept getting that timeout error.

I got the part about iteration starting from zero but I was talking more about the performance of the agent itself. For ex, when I saved the model, I consistently got 25-30 steps without crash and when I reloaded the same weights, it felt like I was starting all over again with the weights from imagenet as I got only like 5-10 steps without crash.

aqeelanwar commented 4 years ago

That is because of the epsilon greedy method. When the training starts, the drone takes more random actions and less predicted actions. Even if the loaded weights were good enough, the actions predicted by them are not being implemented. Rather most of the time random actions are being implemented. Have a look at the policy module in aux_functions.py

If you want ti simple look at the performance of the saved weights, you can run the code in the infer mode.

SumukhaNadig commented 4 years ago

Okay that makes sense, thank you. So based on your experience, how many iterations does it usually take for the program to stop preferring random actions over the ones that are suggested by the network? Also is there any way to change this so that the weights converge faster?

aqeelanwar commented 4 years ago

In the DeepQLearning.cfg file, there is a parameter epsilon_saturation. This parameter signifies the iteration number at which the probability of taking predicted action will be 95 percent. Until then, its a linear or exponential model for this probability (as selected by epsilon_model parameter) starting from 0 to all the way to 95%

Usually it requires about 100k steps for the Qlearning to converge.

SumukhaNadig commented 4 years ago

Okay will try it out, thanks. Also btw, could you check the outdoor_courtyard environment? Looks like the .rar file is corrupt as it is giving a checksum error during extraction.

Also I was wondering, does setting the ClockSpeed to a higher value, increase the number of iterations of training in the same unit of time? Or its related only to rendering on the screen?

aqeelanwar commented 4 years ago

Thanks for pointing that out. I have updated the outdoor_courtyard.zip file. It works now.

ClockSpeed only has to do with the physics of the environment and the drone. ClockSpeed=1 is what real world physics looks like. With higher ClockSpeed, the values for drone velocity, gravity etc is increased.