Closed Niyx52094 closed 3 years ago
Hi there,
It is only for the first time step that makes the model to select first action based on starting observation (see https://github.com/amazon-research/meta-q-learning/blob/master/misc/runner_multi_snapshot.py#L101). That said, I don't think initializing observations history has a significant effect on results, thus you can just simply use zero at line 80 instead.
Hope that helps. Rasool
got it. thank you!
hello, this work about meta q learning is very inspiring and I tried to implement it in my project. When I checked the code I just found the previous list of observations in runner.py in the misc file just adds the obs twice, I'm not sure whether it is a mistake or not? Based on the paper, these previous lists of obs, rewards, actions are input into GPU and generate context variables. If so, the obs should only be added once right? I mean, should the obs in line 80 be added randomly or np.zeros()?