Closed linhlpv closed 3 weeks ago
Hi, we implement BC_VGDF by incorporating an optimistic policy because we found it slightly performs better in some environments during our early experiments. Intuitively, since the agent has a limited budget, it should be optimistic about the online environment to gather diverse transitions. Setting optimistic
as False
in the configs should disable this.
hi @dmksjfl , Cool, that makes sense. Thank you for your reply.
Hi @OffDynamicsRL ,
Thank you for your amazing repo.
I have taken a look and found that you are using an optimistic policy in the implementation of BC_VGDF https://github.com/OffDynamicsRL/off-dynamics-rl/blob/bfdf2b2ded15cccc818f1608e51a1cd6baa85239/algo/offline_online/bc_vgdf.py#L769 . But when I checked the original implementation of VGDF from the authors, they just used the optimistic policy in the online version. So this makes me confused a bit.
Could you please explain about this implementation choice?
Thank you so much and have a great day. Best, Linh