Open NeroBlackstone opened 1 year ago
@NeroBlackstone sorry that we never responded to this! This is actually something that people often want to do. If you're still interested in contributing it, I think we can integrate it in with a few small adjustments. Let me know if you're interested in doing that.
Hi, thanks for your comment. I'm ready to contribute to this feature.
I will do these things:
DictPolicy
and some test code in POMDPs.jl.DictPolicy
is merged, I will contribute a Q-Learning solver and Prioritized Sweeping using this policy in TabularTDLearning.jl.I will open PR for the first step soon. If there are code problems, please point them out.
Thank you very much again.
If we have a discrete space, discrete action, generative MDP. And states space and actions space are hard to enumerate. But we still want to use the traditional tabular RL algorithm to solve it. So, I implement a
DictPolicy
, it used to store state-action pair values. (Sure. Users need to addBase.isequal()
andBase.hash()
for their state and action type.)DictPolicy.jl :
Then we have a special Q-learning based on key-value storage, we don't need to enumerate states space and actions space in MDP definition. (okay, most code copy from TabularTDLearning.jl, but change Q-value store and read.
dict_q_learning.jl :
What's your point of view? Do you have any advice? Thank you for taking the time to read my issue. If you think it's meaningful, I can opne a PR and add some test. It's okay if you think it's meaningless and no versatility. I just finish it for solve my MDP.