Optimisitic policy in BC_VGDF

linhlpv commented 3 weeks ago

Hi @OffDynamicsRL ,

Thank you for your amazing repo.

I have taken a look and found that you are using an optimistic policy in the implementation of BC_VGDF https://github.com/OffDynamicsRL/off-dynamics-rl/blob/bfdf2b2ded15cccc818f1608e51a1cd6baa85239/algo/offline_online/bc_vgdf.py#L769 . But when I checked the original implementation of VGDF from the authors, they just used the optimistic policy in the online version. So this makes me confused a bit.

Could you please explain about this implementation choice?

Thank you so much and have a great day. Best, Linh

dmksjfl commented 3 weeks ago

Hi, we implement BC_VGDF by incorporating an optimistic policy because we found it slightly performs better in some environments during our early experiments. Intuitively, since the agent has a limited budget, it should be optimistic about the online environment to gather diverse transitions. Setting optimistic as False in the configs should disable this.

linhlpv commented 3 weeks ago

hi @dmksjfl , Cool, that makes sense. Thank you for your reply.

OffDynamicsRL / off-dynamics-rl

Optimisitic policy in BC_VGDF #2