Hello, I am a little confused about this equation:
self.future_p = 1 - (1. / (1 + replay_k))
I think reply_k means that we want to select k transitions in one episode(50 transitions) for computing HER goals, but how dose future_p correspond to this? Can you give some interpretation? Thank you!
Hello, I am a little confused about this equation:
self.future_p = 1 - (1. / (1 + replay_k))
I thinkreply_k
means that we want to select k transitions in one episode(50 transitions) for computing HER goals, but how dosefuture_p
correspond to this? Can you give some interpretation? Thank you!