question about strategy for sampling goals for replay?

TianhongDai / hindsight-experience-replay

This is the pytorch implementation of Hindsight Experience Replay (HER) - Experiment on all fetch robotic environments.

MIT License

396 stars 75 forks source link

question about strategy for sampling goals for replay? #24

Open whynpt opened 2 years ago

whynpt commented 2 years ago

thanks a lot! tihs project works well with my own robotic environment. But I am confused about her.her_sampler.sample_her_transitions, because it's quite different from the strategy future as I think. Screenshot from 2022-03-30 17-16-02 In paper, for every transition in buffer, k goals are sampled for every transition in buffer. Then k new transitions are stored in buffer, which seems to be data augmentation .In code, replay-k means ratio to replace, not the number of goals. As her.her_sampler.sample_her_transitions shows, when updating the network, 256 transitions are choosen and part of their goals are replaced with achieved goal. Does replacing goals proportionally eual the strategy future?

captainzhu123 commented 2 years ago

yes, I also have the same question. I see other project, but for the strategy of future they have different sampling methods. However, The result of this project is very good. I am also confused with you, whether to replace it with a specific k target or just replace it with a ratio of k.

whynpt commented 2 years ago

I don't change the implement of Future strategy, but there is another project which replace 'desired goal' with a specific k target. See https://github.com/kaixindelele/DRLib.git @captainzhu123

captainzhu123 commented 2 years ago

I think it is the author of this project who modified the way of selecting sub-targets. You also raised this question earlier. I also think that the sampling experience should not be modified directly according to the ratio, which is similar to a kind of data enhancement. @whynpt

ChrisZonghaoLi commented 1 year ago

Okay so after some research I think he mention the reasons behind these code here in this paper (page 32): https://link.springer.com/content/pdf/10.1007/978-3-030-89370-5.pdf?pdf=button. Paper is titled "Diversity-Based Trajectory and Goal Selection with Hindsight Experience Replay". The code is also available here where the same HER sampling method was used: https://github.com/TianhongDai/div-hindsight/blob/master/baselines/her/her.py