DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.74k stars 1.66k forks source link

Stable Baselines3 PPO optimizer state/policy update slow (need help) #1854

Closed Dual111 closed 6 months ago

Dual111 commented 6 months ago

❓ Question

Hello I just joined this website so sorry if this is in wrong place. So a little about myself: I'm new to AI / Coding, I've been working with AI projects for about a year almost daily, with ChatGPT, all that I've learned so far are from ChatGPT so be newbie friendly :)

What I'm working on is program that uses SB3's Pytorch PPO to train AI which utilizes YOLOv5 object models, to play videogame League of Legends. Currently it works perfectly, only problem is when it reaches the "n_steps" defined in the model's hyperparameters it starts the "optimizer state update/policy update" training or whatever it is called (ChatGPT told me its called like that) and it prints the logger table:

| time/ | | | fps | 0 | | iterations | 2 | | time_elapsed | 50 | | total_timesteps | 20 | | train/ | | | approx_kl | 0.00850364 | | clip_fraction | 0.09 | | clip_range | 0.2 | | entropy_loss | -21.3 | | explained_variance | -0.0215 | | learning_rate | 5e-05 | | loss | 1.8 | | n_updates | 10 | | policy_gradient_loss | -0.0633 | | std | 1 | | value_loss | 86.1 |

So the problem is this optimizer state/policy update progress is really slow, as seen its 0-1 fps each time. Meaning if I run the program for 100 n_steps it will take ~100 seconds to complete this update and my agent just stands still (afk ingame) for the 100 seconds. I setup CUDA and cuDNN and Pytorch, torchvision, torchaudio etc. correctly to use GPU and use CUDA/GPU to run the model. My CPU and GPU usage neither go over 40% during this process and my memory usage stays fairly low aswell.

Some steps i've tried to fix this issue are: increase batch_size in hyperparameters but it doesn't affect it at all even when its set to "batch_size = 100000" it has no difference so im running it without defining the batch_size at all. I've tried changing the def detect_object's to use GPU, def preprocess_observation to use GPU. Its been about a week tackling with this issue, I've tried editing the SB3's PPO source code to allow me to force the "update optimizer state/policy" (the same thing what happens when n_steps is reached) when stopping training with ctrl+c, however I haven't got it to work. If I could force the update when pressing ctrl+c I could just set n_steps for really big number (that it never reaches it) and after a league match is played I stop training, update the policies/optimizer state with ctrl + c and save the model that way.

The source code that I've tried editing is located here if you have it installed with pip: C:\Users\yourname\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\ppo\ppo.py

Do you have any ideas/could you help me? If you think/know why the fps are 0-1 during the policy/optimizer state update can you give some ideas/help to fix it?

If you could be so nice that you'd actually went and edited the SB3's PPO sourcecode to force the update when detecting ctrl+c (same update what happens when n_steps is reached) I would be so thankful!

Best regards and happy day! Dual

Checklist

araffin commented 6 months ago

Related to https://github.com/DLR-RM/stable-baselines3/issues/715 https://github.com/DLR-RM/stable-baselines3/issues/1565 https://github.com/DLR-RM/stable-baselines3/issues/1059

Dual111 commented 6 months ago

Related to #715 #1565 #1059

Thank you for fast reply @araffin

Your comment in the mentioned #1059 : A temporary solution is to set n_steps to a large value (and probably reduce n_epochs then).

For now this seems like a working solution! I didn't have defined n_epochs at all, it used the default = 10 value from the sourcecode, after adding n_epochs=1, it allows me to put big n_steps and big batch_size and still complete the optimizer state/policy update fast! Even though it still says fps:0 a update of 64 n_steps is done in under a second. You're a hero! Thank you so much :)