Open josndan opened 1 week ago
In _sample_action of OffPolicyAlgorithm class, self.predict function is called. But episode_start flag is never set for any off policy algorithms.
_sample_action
OffPolicyAlgorithm
self.predict
episode_start
No response
Hello, that's correct because there is current only RecurrentPPO that make use of states (LSTM states) and episode starts (to reset the states).
RecurrentPPO
states
🐛 Bug
In
_sample_action
ofOffPolicyAlgorithm
class,self.predict
function is called. Butepisode_start
flag is never set for any off policy algorithms.To Reproduce
No response
Relevant log output / Error message
No response
System Info
No response
Checklist