Baselines gym wrapper tutorial is outdated and not functional

gnouhp commented 5 years ago

The following README contains an example with the baselines deepq module that seems to only work with an older version of the baselines code: https://github.com/Unity-Technologies/ml-agents/blob/master/gym-unity/README.md

The error that I'm getting is: TypeError: learn() missing 1 required positional argument: 'network', and is directly referenced by an issue on the OpenAI baselines repo here:

https://github.com/openai/baselines/issues/520

Is there any chance that someone can update this tutorial's example as well as include more gym wrapper examples/tutorials that include PPO and PPO2? If not, I'm committed to using mlagents for my research, and would be glad to come up with such a tutorial after I figure out how to get the gym wrapper working smoothly with baselines.

Thanks!

awjuliani commented 5 years ago

Hi @gnouhp Thanks for bringing this up. I was aware that OpenAI had recently done a large rewrite of their Baselines, but I hadn't had the time to ensure that the examples I provided were still applicable. If you have the time and interest, we'd happy accept updated examples. If not, I can add it to our worklog, and hopefully be able to address it myself soon.

gnouhp commented 5 years ago

@awjuliani I'll give it a go! If I can't get the updates created in a day or two I'll add another comment to this thread.

Sohojoe commented 5 years ago

@gnouhp - I have been looking into Baselines over the past few days and I have a working repro with a recent baselines + ml-agents 0.5.1 + MarathonEnvs: - https://github.com/Sohojoe/MarathonEnvsBaselines - you will need to build the hopper or walker environment and put it into the env folder (let me know if you get stuck)

I was able to get multi-agent training using MLP (whereby it has one agent per cpu) - this requires building an environment with a single agent, then MLP spins up multiple instances. I really like how ml-agents supports multiple agents within an environment and love to figure out how we can keep that pattern with some of the more optimized algorithms in baselines.

Know there is a bug in baselines that impacts saving... from their docs:

NOTE: At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in baselines/common/vec_env/vec_normalize.py. This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default

Apparently, there is an alternative version of baselines which does not have this problem, stable_baselines, I included it in the repro and I'm playing around with it to try and get it training. @awjuliani you may want to look at this repro, it supports vectorized environments but it wasn't clear how to get it to work with a multi-agent ML-Agent enviroment

gnouhp commented 5 years ago

@Sohojoe - spent a few hours looking at your scripts and it gave me a lot of insights into possible solutions. I did get stuck trying to run the DeepMindHopper marathon env (I built the env and set the brains to external), when I ran:

mpiexec -n 4 python -m baselines.run_unity --alg=ppo2 --env=./envs/Hopper --num_timesteps=2e5 --save_path=./models/hopper_200k_ppo2

I'm seeing the exception:

gym_unity.envs.unity_env.UnityGymException: The environment was launched as a single-agent environment, howeverthere is more than one agent in the scene.

Also, were you able to get any envs with visual observations training?

Sohojoe commented 5 years ago

@gnouhp - you need to build as a single agent

select agents 2 through 16
disable the checkbox in the inspector

Which platform are you using (i.e. windows / mac / linux ) and which baselines algorithms are you hoping to use?

I've not tried visual observations.

gnouhp commented 5 years ago

@Sohojoe - I'm using Windows 10. The baselines algorithm that I'm trying to use is ppo2 for the training speedups. My intention was to test the baselines ppo2 algorithm against visual mlagents envs such as GridWorld, PushBlock, etc., and compare it's performance to the benchmark performance of the ppo model used in the mlagents-learn, with the same model Brain parameters as defined in the training config file. I've been able to test the latter and it runs w/o any issues.

For the marathon envs, I disabled the agents DeepMindHopper(1) - DeepMindHopper(15) and built the env. I tried building in two ways, with the remaining active DeepMindHopper agent tag set to "untagged" and another time with it set as "player". I ran the command line code multiple times and while not getting any errors, and each time the 4 training rendering windows open. However, each time one or more of the windows displays "Unity Environment (Not Responding)". I think it could be a compatibility issue regarding the windows mpi package.

I appreciate the help, thank you very much.

Adrelf commented 5 years ago

Hi everyone. I want to use the Gym Wrapper (UnityEnv). I followed all instructions and when I launch my script I get the following error : mlagents.envs.exception.UnityEnvironmentException: The API number is not compatible between Unity and python. Python API : API-5, Unity API : API-4. I have already the latest version of MLAgents. I don't know what to do. Thx for your help.

awjuliani commented 5 years ago

Hi @Adrelf,

The version released with ML-Agents v0.5 is API-5. It may be that your Unity project is outdated. Here is the line which defines the version on the master branch: https://github.com/Unity-Technologies/ml-agents/blob/master/UnitySDK/Assets/ML-Agents/Scripts/Academy.cs#L95.

Adrelf commented 5 years ago

Thanks awjuliani for your help.

Sohojoe commented 5 years ago

@gnouhp how are you getting on?

1) I got a multiagent (16 agents) environment training with ppo2. My code is a bit hacky but I found the performance of 2x of ppo1 on my early tests.

You can download my windows environments here
if you do not need to normalize your enivorment, comment out env = VecNormalize(env) line 129 in run_multiagent_unity.py
... Note: Enviroments using VecNormalize do not Save / Load properly
use this to train python -m baselines.run_multiagent_unity --alg=ppo2 --env="envs\hopper-x16\Unity Environment.exe" --num_timesteps=1e6 --save_path=models\hopper_1m_ppo2 2) there seems to be an issue with windows + mpi + ml-agents. I was able to get windows + mpi + cartpole to train

harperj commented 5 years ago

Just wanted to give an update on this issue. We've updated the documentation in this PR -- it will be updated in the upcoming v0.6 release but you should be able to use the same example code with v0.5.

harperj commented 5 years ago

I'm going to close this issue for now, but feel free to reopen if you have further issues.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

Baselines gym wrapper tutorial is outdated and not functional #1318