MishaLaskin / curl

CURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning
MIT License
561 stars 88 forks source link

Questions about infinit bootstrap #22

Open geekyutao opened 2 years ago

geekyutao commented 2 years ago

Hi, thank you for your code. I'm a little bit confused of the infinit bootstrap in https://github.com/MishaLaskin/curl/blob/8416d6e3869e38ca0e46fcbc54a2f784dc09d7fc/train.py#L269 . Will it be wrong when sampling at the end of an episode (where the next_obs is the start observation of the next episode)? It seems you simply ignore this.

yueyang130 commented 2 years ago

It seems in DMcontrol there is no true terminal state. So it allows infinte bootstrap.

yueyang130 commented 1 year ago

For @geekyutao 's question, the point is that the next_ob will never be the start observation of the next episode. Because at the previous timestep, the next_ob is the terminal state and done is true (Note done_bool is alway false whereas done is true at the max step). Then env is reset and the ob is set to the start observation.