Open dharux opened 7 months ago
Hi @dharux,
Thanks for your feedback and clear explanation of your concerns.
Two points:
Looking at the code for q_based_policy.jl, there is no reason it can only support TDLearner
, so please feel free to submit an appropriate pull request and I’ll review it. :)
TDLearner
is not an abstract type, and ensuring it supports arbitrary approximations is beyond scope. That said, earlier versions of TDLearner
supported algorithms beyond :SARS
and supported LinearApproximator
s here and here; it would be great to have these re-included in this package ecosystem, the best place at the present would be in the ReinforcmentLearningFarm
package (part of this repo). If you want to pursue adding them back, let me know and I can give you some pointers.
Thanks for the reply.
I have submitted a pull request for this issue.
I would be interested in adding the algorithms back into the package. Some guidelines would definitely be useful.
The types in built-in policies and algorithms like
QBasedPolicy
andTDLearner
are overly specific and prevent users from using the existing code to extend to new algorithms. Rather, it forces users to rewrite large chunks of code.For example,
QBasedPolicy
is defined asstruct QBasedPolicy{L<:TDLearner,E<:AbstractExplorer} <: AbstractPolicy
and all the methods for it similarly. Therefore, I cannot write a new learner and use it in aQBasedPolicy
, even though all the methods for it seem to be very general.Another example is
TDLearner
which is defined asBase.@kwdef mutable struct TDLearner{M,A} <: AbstractLearner where {A<:TabularApproximator,M<:Symbol}
. However, the constructor for it only allowsM=:SARS
. This makes me have to rewrite the whole struct if I want to write a new TD learning algorithm or if I want to use a different kind of approximation (e.g linear).In my opinion, these restrictions should be removed and they should be replaced with general types such as
AbstractLearner
.