cbfinn / gps

Guided Policy Search
http://rll.berkeley.edu/gps/
Other
597 stars 239 forks source link

Add support for images and multi-modal networks. #15

Closed bstadie closed 8 years ago

bstadie commented 8 years ago

This is largely a backport of @avivt's work to include images/multi-modal networks. I have re-written for tensorflow and cleaned up code which broke non mujoco agents.

Major Changes

  1. Add support for learning with image observations. Currently, these changes only work with MuJoCo and the Tensorflow backend.
  2. In conjunction with being able to train on images, you can now train on multi-modal networks.
  3. Adds the ability to specify train and test conditions in hyperparams file.
  4. Adds the ability to set different xml files per condition in MuJoCo
cbfinn commented 8 years ago

Thanks Bradly! Huge PR - I left a bunch of comments.

Agreed on issue 1, and issue 2. Regarding images+ROS, @xinyutan17 and @emilyscharff are working on it!

Let me know when you finish replying to/addressing comments, and I'll take another look.

bstadie commented 8 years ago

Thanks for all the suggestions Chelsea. I'm pretty bogged down at the moment but hope to have time during the weekend.

cbfinn commented 8 years ago

Sounds good!

On Fri, Apr 15, 2016 at 12:46 AM, Bradly Stadie notifications@github.com wrote:

Thanks for all the suggestions Chelsea. I'm pretty bogged down at the moment but hope to have time to look over this during the weekend.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/cbfinn/gps/pull/15#issuecomment-210341248

bstadie commented 8 years ago

Alright! I have finished addressing your last round of comments.

Also, while we're changing the docs to reflect tf support, can we add my name to the contributor list? Didn't want to add it myself.

cbfinn commented 8 years ago

Other than the comments above, LGTM. Thanks Bradly!

bstadie commented 8 years ago

Sorry about all the missing code. I think the main in the old repo was a little different, and merging them together was kind of tricky. Will add these things back in.

bstadie commented 8 years ago

Alright. I think I've taken care of everything. Training the peg example on images works well (though it does take the full 10 iterations) and I've managed to eliminate a few extra corner cases that this code had introduced. Let me know what you think.

cbfinn commented 8 years ago

It looks like the peg images example only has one condition. Is vision needed to do the task?

bstadie commented 8 years ago

It's the same task as in the MuJoCo badmm example, so probably not.

cbfinn commented 8 years ago

Ok. Can you add a TODO in the hyperparms file to make it an example that requires vision (and mentioning that the purpose of the example is to demonstrate how to use images)?

I don't want the example to be misleading, but also don't want this PR to hang on that, because it might take some work to get a working example with vision.

On Thu, Apr 21, 2016 at 11:16 AM, Bradly Stadie notifications@github.com wrote:

It's the same task as in the MuJoCo badmm example, so probably not.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/cbfinn/gps/pull/15#issuecomment-213044482

bstadie commented 8 years ago

Sorry Chelsea, I think there was a slight misunderstanding. When you asked if the task needed vision, I thought you were asking if the task in general required vision. The answer is no if you give it the target ee points and current ee points as in the mjc_badmm example.

However, if you withhold the ee point data and only give it the joint angles and velocities while varying initial conditions, the neural net policies will be bad (obviously). If you also supply it with images, then it will be able to adapt to all 4 conditions in the peg example.

I see now that I only had one initial condition in the peg example, that was for debugging and something I forgot to change back. Sorry for the confusion that may have caused.

Here is a video of the policy trained on only joint angles and velocities (which can't do the task) vs one trained with images (which adapts correctly).

https://youtu.be/Nk4NiO8Ofr0

I've updated the hyperparams file to include all 4 conditions.

cbfinn commented 8 years ago

Ok, cool! The 1 condition in the hyperparams file was what I was worried about.

Do you think it would make sense to add a link to the video in the code or in the docs, so that people know what the expected behavior looks like?

Other than that, feel free to merge the PR!

bstadie commented 8 years ago

Added a link to the video in the hyperparams file. Will merge now.

cbfinn commented 8 years ago

Thanks Bradly!

On Thu, Apr 21, 2016 at 8:58 PM, Bradly Stadie notifications@github.com wrote:

Merged #15 https://github.com/cbfinn/gps/pull/15.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/cbfinn/gps/pull/15#event-637137198