Open araffin opened 4 years ago
PS: here is the result using the trained model and deterministic actions:
>print(np.sqrt(s))
9.9
Thank you, Antonin!
The first one really looks overfit, but the second one is very cool and converges to the right equilibrium!
With regard to continuous action, we did try several other algorithms; but because we want to simulate real DBS devices (which send pulsatile stimuli into the brain) -- the current configuration is actually more pertinent to the real life.
The same applies to the exploration noise. There is a fundamental limit called finite-size fluctuation, beyond which it is impossible to suppress the network of neurons. So, having that noise is actually useful. Whether it is better to first find the best algorithm and then test stability to noise, or vice versa is still an open question (see speculations in "krylov-DBS-RL-paper") - the reason: strong nonlinear response of the environment. We will look into your suggestions!
Also, we will get back to you on Monday with pull requests, etc.
Hello,
Nice project =)
I created a colab notebook to try it online directly: https://colab.research.google.com/drive/19bdAiKZY0r5OR3gEv7164CjDOdMRGYqt
Btw, why didn't you use
deterministic=True
for the prediction? (this would suppress the exploration noise)Quick question: did you try other algorithms that are usually more suited for continuous actions? (like soft actor-critic (SAC), DDPG and TD3 which should be more sample efficient too)
We would also be interested if you could do a pull request on stable-baselines where you add your project to the documentation (project section) ;)
PS: I tried with SAC (with parameters from the original paper) on on your environment and I could get (apparently) good results in 2e5 steps, the plot:
And the ratio of stds: