[Bug]: Episode start flag is never set for off policy algorithms

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

MIT License

8.84k stars 1.68k forks source link

🐛 Bug

In _sample_action of OffPolicyAlgorithm class, self.predict function is called. But episode_start flag is never set for any off policy algorithms.

To Reproduce

No response

Relevant log output / Error message

No response

System Info

No response

Checklist

[X] My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] I have provided a minimal and working example to reproduce the bug
[X] I've used the markdown code blocks for both code and stack traces.

DLR-RM / stable-baselines3

[Bug]: Episode start flag is never set for off policy algorithms #2011

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist