Open MishaLaskin opened 5 years ago
Hey, sorry I missed this since was still doing some core dev way back when. Yeah this sounds like a great thing to add! I don't have experience with it myself. But we can talk about a plan for how to add it in?
Yea sounds good. It requires the following two additions:
let's transfer the discussion to the rll Slack
Hey, how is the progress on this feature? I was trying to see if I could implement this around rlpyt myself, but it isn't very straightforward. Maybe a good source to look into is the implementation used in stable-baselines, which seems to consist only of a couple of wrappers.
Good question, @MishaLaskin let's talk this week?
Thanks @ritalaezza for the reference source!
Sure! Maybe after ICML supplementary material deadline?
On Wed, Feb 12, 2020 at 4:26 PM astooke notifications@github.com wrote:
Good question, @MishaLaskin https://github.com/MishaLaskin let's talk this week?
Thanks @ritalaezza https://github.com/ritalaezza for the reference source!
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/astooke/rlpyt/issues/6?email_source=notifications&email_token=ABHWQWIBBG7PIHFSSKWWA33RCSHTNA5CNFSM4IF3YKF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELS5DWI#issuecomment-585486809, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHWQWIDYBZSVT4K7XA5UYTRCSHTNANCNFSM4IF3YKFQ .
Are there any updates / ETA on this issue? Are there any rlpyt-specific blockers for the implementation?
no blockers - mostly time commitment, it's currently on the back burner
Really looking forward to this feature. Do you have an ETA?
OK seems people around the group have gotten a little busy, still no movement here as far as I know. If anyone wants to work on this, that would be most helpful! I'm happy to advise through emails and phone calls? :)
@MishaLaskin double checking your status on this?
If nobody else has the time, I guess I can give it a try. I have already been using rlpyt for goal-based environments.
Considering HER is based on goal-conditioned RL, I believe it is better to implement some plugins as multi-modal RL first. And both "achieved_goal" and "desired_goal" can be a field in observation (a namedarraytuple).
Also, a little Mixin implementation will work on a multi-modal PPO agent that outputs action whose space consists of a FloatBox and an IntBox. I can make my current implementation public if needed, although I am currently using it for a research project. 😄
I have implemented HER by adding some multi-modal interface in the repo, please check out my implementation at https://github.com/ZiwenZhuang/rlpyt/tree/HER/rlpyt/projects/goalRL
There is a Readme waiting for you. 😄
Since HER applies to any off policy algorithm, I think this would be useful for researchers studying sparse reward problems.
I can take a crack at this and submit a PR. Let me know what you think @astooke!