Open whynpt opened 2 years ago
yes, I also have the same question. I see other project, but for the strategy of future they have different sampling methods. However, The result of this project is very good. I am also confused with you, whether to replace it with a specific k target or just replace it with a ratio of k.
I don't change the implement of Future strategy, but there is another project which replace 'desired goal' with a specific k target. See https://github.com/kaixindelele/DRLib.git @captainzhu123
I think it is the author of this project who modified the way of selecting sub-targets. You also raised this question earlier. I also think that the sampling experience should not be modified directly according to the ratio, which is similar to a kind of data enhancement. @whynpt
Okay so after some research I think he mention the reasons behind these code here in this paper (page 32): https://link.springer.com/content/pdf/10.1007/978-3-030-89370-5.pdf?pdf=button. Paper is titled "Diversity-Based Trajectory and Goal Selection with Hindsight Experience Replay". The code is also available here where the same HER sampling method was used: https://github.com/TianhongDai/div-hindsight/blob/master/baselines/her/her.py
thanks a lot! tihs project works well with my own robotic environment. But I am confused about
her.her_sampler.sample_her_transitions
, because it's quite different from the strategy future as I think. In paper, for every transition in buffer, k goals are sampled for every transition in buffer. Then k new transitions are stored in buffer, which seems to be data augmentation .In code,replay-k
means ratio to replace, not the number of goals. Asher.her_sampler.sample_her_transitions
shows, when updating the network, 256 transitions are choosen and part of their goals are replaced with achieved goal. Does replacing goals proportionally eual the strategy future?