add Hindsight Experience Replay to replay buffers

MishaLaskin commented 5 years ago

Since HER applies to any off policy algorithm, I think this would be useful for researchers studying sparse reward problems.

I can take a crack at this and submit a PR. Let me know what you think @astooke!

astooke commented 5 years ago

Hey, sorry I missed this since was still doing some core dev way back when. Yeah this sounds like a great thing to add! I don't have experience with it myself. But we can talk about a plan for how to add it in?

MishaLaskin commented 5 years ago

Yea sounds good. It requires the following two additions:

goal-based environment wrapper that stores goals (desired and achieved) in the timestep object
relabeling replay buffer - since you have the capability of extracting sequences out of the replay buffer already, we could use that to extract the trajectory and relabel goals. This step can actually be done async as long as you can extract full trajectories from the replay buffer

let's transfer the discussion to the rll Slack

ritalaezza commented 4 years ago

Hey, how is the progress on this feature? I was trying to see if I could implement this around rlpyt myself, but it isn't very straightforward. Maybe a good source to look into is the implementation used in stable-baselines, which seems to consist only of a couple of wrappers.

astooke commented 4 years ago

Good question, @MishaLaskin let's talk this week?

Thanks @ritalaezza for the reference source!

MishaLaskin commented 4 years ago

Sure! Maybe after ICML supplementary material deadline?

On Wed, Feb 12, 2020 at 4:26 PM astooke notifications@github.com wrote:

Good question, @MishaLaskin https://github.com/MishaLaskin let's talk this week?

Thanks @ritalaezza https://github.com/ritalaezza for the reference source!

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/astooke/rlpyt/issues/6?email_source=notifications&email_token=ABHWQWIBBG7PIHFSSKWWA33RCSHTNA5CNFSM4IF3YKF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELS5DWI#issuecomment-585486809, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHWQWIDYBZSVT4K7XA5UYTRCSHTNANCNFSM4IF3YKFQ .

bycn commented 4 years ago

Are there any updates / ETA on this issue? Are there any rlpyt-specific blockers for the implementation?

MishaLaskin commented 4 years ago

no blockers - mostly time commitment, it's currently on the back burner

DennisCraandijk commented 4 years ago

Really looking forward to this feature. Do you have an ETA?

astooke commented 4 years ago

OK seems people around the group have gotten a little busy, still no movement here as far as I know. If anyone wants to work on this, that would be most helpful! I'm happy to advise through emails and phone calls? :)

astooke commented 4 years ago

@MishaLaskin double checking your status on this?

ritalaezza commented 4 years ago

If nobody else has the time, I guess I can give it a try. I have already been using rlpyt for goal-based environments.

ZiwenZhuang commented 3 years ago

Considering HER is based on goal-conditioned RL, I believe it is better to implement some plugins as multi-modal RL first. And both "achieved_goal" and "desired_goal" can be a field in observation (a namedarraytuple).

Also, a little Mixin implementation will work on a multi-modal PPO agent that outputs action whose space consists of a FloatBox and an IntBox. I can make my current implementation public if needed, although I am currently using it for a research project. 😄

ZiwenZhuang commented 3 years ago

I have implemented HER by adding some multi-modal interface in the repo, please check out my implementation at https://github.com/ZiwenZhuang/rlpyt/tree/HER/rlpyt/projects/goalRL

There is a Readme waiting for you. 😄

astooke / rlpyt

add Hindsight Experience Replay to replay buffers #6