DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.84k stars 1.68k forks source link

[Bug]: Episode start flag is never set for off policy algorithms #2011

Open josndan opened 1 week ago

josndan commented 1 week ago

🐛 Bug

In _sample_action of OffPolicyAlgorithm class, self.predict function is called. But episode_start flag is never set for any off policy algorithms.

To Reproduce

No response

Relevant log output / Error message

No response

System Info

No response

Checklist

araffin commented 1 week ago

Hello, that's correct because there is current only RecurrentPPO that make use of states (LSTM states) and episode starts (to reset the states).