Open Timodhau opened 2 years ago
Perhaps the stochasticity is brought by the env.reset()
.
Because both stochastic and determinstic policy algorithms will use determinstic policy by default during the testing phase.
Well i took a look just before :
with _torch.no_grad(): for i in range(environment.max_step): s_tensor = _torch.as_tensor((state,), device=device) a_tensor = act(s_tensor) # action_tanh = act.forward() action = ( a_tensor.detach().cpu().numpy()[0] ) # not need detach(), because with torch.no_grad() outside state, reward, done, _ = environment.step(action)
in file models.py, my states are similar. Maybe i introduced an error myself but I don't think so.
Hello, I ran elegant rl with finrl processor using the function DRLAgent_erl.DRL_prediction and it seemed to not be deterministic.