Closed benjamc closed 3 years ago
Initial x as 0 makes sense, I think. Without haing every run it, I can't really say if the coefficient range is good or not, but it's super easy to adapt, so I think that's good. So thanks for the benchmarks, I'll merge as soon as you adressed my comment (second one is optional).
Any updates here, @benjamc ? I found another thing to fix in the meantime, the reset doesn't seem to return a state. I missed that, but it's definitely not desired behavior ;D
Also, we may want to set an upper limit for values. The lr get huge pretty fast with actions > 1
Changes from my side:
Things where input would be nice:
As soon as we settled those two, we can merge
Please have a look. :-) Especially on the bounds for the coefficients and the initial x (not sure about that).