Denys88 / rl_games

RL implementations
MIT License
941 stars 155 forks source link

frames comparison within max epoch if #212

Closed ankurhanda closed 1 year ago

ankurhanda commented 1 year ago

The current a2c_common.py has self.frame >= self.max_frames in the same if with self.epochs >= self.max_epochs https://github.com/Denys88/rl_games/blob/d8645b2678c0d8a6e98a6e3f2b17f0ecfbff71ad/rl_games/common/a2c_common.py#L978-L984

It used to be like this https://github.com/ArthurAllshire/rl_games/blob/master/rl_games/common/a2c_common.py#L1243-L1248

The other issue is that when the if condition returns True it always prints MAX EPOCHS NUM! which is not an accurate description because I was running my code and it would print this even when epochs hadn't reached their max. Need to have a separate if condition saying MAX FRAMES NUM!. Though I think what we had earlier is enough and we don't need to add another check on frames.

ViktorM commented 1 year ago

Fixed in https://github.com/Denys88/rl_games/pull/213

ViktorM commented 1 year ago

@alex-petrenko ^_^

ViktorM commented 1 year ago

@ankurhanda can you confirm the issue is fixed for you?

ankurhanda commented 1 year ago

@ViktorM Looks good https://github.com/Denys88/rl_games/blob/master/rl_games/common/a2c_common.py#L1014-L1032

I will retrain and verify soon.

ankurhanda commented 1 year ago

This is still a problem because max_frames are set in default in the code so if epochs have not reached max_epochs it will still enter the 2nd if condition and end the training.

https://github.com/Denys88/rl_games/blob/master/rl_games/common/a2c_common.py#L1014-L1032

alex-petrenko commented 1 year ago

@ankurhanda can we just set it to something very large by default, like 10 trillion? It is useful to have a stop condition by num frames (i.e. for paper plots where it is customary to report training to 100M frames or smth like that). But it should not interfere with experiments where it's not needed.

ViktorM commented 1 year ago

@ankurhanda in your example, it won't enter the 2nd condition as the default max_frames value is -1.

@alex-petrenko there were many issues with max_frames code. Linear learning rate wasn't updated properly when max_frames, but not max_epochs were set. Misleading message when max_frame was reached, it notified instead that max_epochs were reached and a few cosmetic ones.

alex-petrenko commented 1 year ago

Thank you for fixing these. I was not aware an extra stopping condition can affect learning rate.

ViktorM commented 1 year ago

Extra stopping condition is not affecting. But the linear scheduler itself instead of current_epoch and max_epoch should be using current_frame and max_frames to calculate the current learning rate if we have max_frames as a stopping condition.