Closed Wu-Chenyang closed 4 years ago
Hi @Chengyang-Wu , Thanks for your comment. Actually, storing the state particles would not give you any additional information about the belief for rollouts. The reason is that rollouts are only performed from leaf nodes that have just been created, so every node that is used for a rollout has only a single state particle.
The original paper actually suggests using history-based rollouts which are quite confusing to program, so we added the option to use the single state particle for a fully-observable rollout policy (even though this may over-estimate the value).
Does that make sense?
Thank you very much for your reply. I think your explanation is perfectly correct and it cleared up my confusion.
I notice that in this realization of POMCP, the state particle of belief node isn't stored as was described in [1]. I find it's inconvenient when you want to use a rollout policy that take in the current belief and generate an action.