benelot / pybullet-gym

Open-source implementations of OpenAI Gym MuJoCo environments for use with the OpenAI Gym Reinforcement Learning Research Platform.
https://pybullet.org/
Other
823 stars 123 forks source link

Are pretrained models MuJoCo compatible? #31

Open jendelel opened 5 years ago

jendelel commented 5 years ago

Hi,

I already had to model my environments in MuJoCo due to baseline algorithms that used it. As a proof of concept, I would like to use you pretrained Humanoid and let it run through my environment. I tried to model the observations based on your code, but didn't succeed so far. I can send my code if you want.

Do you think it such port is even possible? It seems that the actions are too large.

Thank you, Lukas

benelot commented 5 years ago

Hi! That was the original intent of the reimplementations! From the observations and actions length point of view, they should be compatible actually. Checkout the HumanoidMuJoCoEnv-v0, the other is the roboschool one that is oddly different (no idea why they did that). In case they are not equal in length, tell me. Another issue I have for now is that I have no idea about the corresponding observations between mujoco and pybullet. Mujoco has a cryptic state vector of the environment and it is hard to find out what the openai guys did there. So I have a hard time to get the right observations.

I am very interested in pretrained agents for all openai gym mujoco envs btw such that I can test the similarity of my envs to the openai gym reference implementation. If you want to help me with any of this, that would be great!

benelot commented 5 years ago

On the chance that the port works: I have no idea actually. For now it only worked with the pendulums to be honest. For now I am stuck with the observations such that I can not really tell yet.

jendelel commented 5 years ago

Hi,

Thanks a lot for your reply. By debugging step by step, I think I got the observations to be almost the same as in PyBullet the predicted joint torques were very similar. However, after the action went through the engine, I got quite different observations.

I guess I missed something with the actuators or MuJoCo just handles these things differently. In the end, the Humanoid fell and didn't get up.

We managed to port even most of the reward for easy Flagrun. After 2 days of training (using SAC) the humanoid seems to move in the right direction with only occasional falling.

I'd love to help you with pretrained checkpoints for MuJoCo humanoid, unfortunately I don't have access to much compute power.

Good luck with your project. I really like what you're doing.

Lukas

On Wed, Sep 25, 2019, 18:31 Benjamin Ellenberger notifications@github.com wrote:

On the chance that the port works: I have no idea actually. For now it only worked with the pendulums to be honest. For now I am stuck with the observations such that I can not really tell yet.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benelot/pybullet-gym/issues/31?email_source=notifications&email_token=AD37XUN32X6RGKZBMD3657DQLOG6JA5CNFSM4I2B4UR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7SQYHI#issuecomment-535104541, or mute the thread https://github.com/notifications/unsubscribe-auth/AD37XUOHILECHOLBDLTT2LDQLOG6JANCNFSM4I2B4URQ .

jendelel commented 4 years ago

Hi again,

so I managed to port the HumanoidFlugrun to MuJoCo. I would like to try to train the same ppo agent you trained for yours. However, the Tensorforce has evolved significantly and it looks like the API isn't the same. Could you tell me the command you used and the version of Tensorforce?

Thanks a lot.

Lukas

benelot commented 4 years ago

Hi! Sorry for coming back to you so late. I do not remember what version I used unfortunately, since it was only a preliminary test if I can train them, and as others have mentioned, for some envs people do not manage to do it. I can not say exactly why yet, there is still some stuff to do on it.

On Mon, Nov 11, 2019 at 9:14 AM Lukas Jendele notifications@github.com wrote:

Hi again,

so I managed to port the HumanoidFlugrun to MuJoCo. I would like to try to train the same ppo agent you trained for yours. However, the Tensorforce has evolved significantly and it looks like the API isn't the same. Could you tell me the command you used and the version of Tensorforce?

Thanks a lot.

Lukas

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/benelot/pybullet-gym/issues/31?email_source=notifications&email_token=AAXXXKZU5SY6PLZU6RFTIILQTEH6JA5CNFSM4I2B4UR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWAD4I#issuecomment-552337905, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXXXKY5LEW55GIEOJVYKZTQTEH6JANCNFSM4I2B4URQ .

sgillen commented 4 years ago

Hi all,

I'm currently working on making some bullet environments that are compatible with the OpenAI Mujoco ones. I started making my own before discovering this project.

I am very interested in pretrained agents for all openai gym mujoco envs btw such that I can test the similarity of my envs to the openai gym reference implementation. If you want to help me with any of this, that would be great!

I am currently conducting the same experiments basically, with both your environments and the ones I started. I have some polices that work well in mujoco, but I have not gotten them to transfer successfully yet. I've been focusing on walker2d first (for no particular reason) I see two big problems to the transfer.

  1. I think there is some mismatches going on in the state vectors. There are some things like the starting height being 0 in the bullet env vs 1.25 in the mujoco one, and I think the joint ordering is different between the two. This kind of stuff will be tedious but easy to fix.

  2. The physics are different between the two simulators. I've been using pybullets setPhysicsParameters and setDynamicsParameters to get as close as possible to Mujoco. the two sims have fundamentally different constraint models so I'm not sure how close we can get, or if being policy compatible is possible.

I'm not sure if any of you are working on this anymore (this thread is rather old now), but if so I'm happy to share what I have.