amazon-science / meta-q-learning

Code for the paper "Meta-Q-Learning"( ICLR 2020)
https://arxiv.org/abs/1910.00125
Other
102 stars 16 forks source link

Why the previous state_list add the inital observation twice in runner.py in misc file? #4

Closed Niyx52094 closed 3 years ago

Niyx52094 commented 3 years ago

hello, this work about meta q learning is very inspiring and I tried to implement it in my project. When I checked the code I just found the previous list of observations in runner.py in the misc file just adds the obs twice, I'm not sure whether it is a mistake or not? Based on the paper, these previous lists of obs, rewards, actions are input into GPU and generate context variables. If so, the obs should only be added once right? I mean, should the obs in line 80 be added randomly or np.zeros()? e4b412b87122fb8d470fb7a35b4b9de 282fb02e04c6bd5271c27e732df12cf

rasoolfa commented 3 years ago

Hi there,

It is only for the first time step that makes the model to select first action based on starting observation (see https://github.com/amazon-research/meta-q-learning/blob/master/misc/runner_multi_snapshot.py#L101). That said, I don't think initializing observations history has a significant effect on results, thus you can just simply use zero at line 80 instead.

Hope that helps. Rasool

Niyx52094 commented 3 years ago

got it. thank you!