Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
16.93k stars 4.13k forks source link

Help Wanted : Parallel training #828

Closed aurelien78 closed 5 years ago

aurelien78 commented 6 years ago

Hello,

First, thanks you to share this work ! I know their is a worker-id option on learn.py script that allows to run multiple training at the sametime. However, as far as I understood, those different instances are not training the same model, but multiple in parallel. Here comes my question : is it possible to launch more than one instance of unity exe (which will correspond to one training scenario) but all of them are training the same model in tensorflow.

My purpose is to use as many as CPU allowed (52 in my case) to train the same model. Since Unity exe are monoCPU, I think it make sense.

What do you all think ?

Thanks

mmattar commented 6 years ago

Hi @aurelien78 - that is a great idea to enable scaling the training to multiple cores and machines. We are currently working on a version that will enable this.

superjayman commented 6 years ago

Also need this feature, scaling across multiple machines will be great! Any ideas on ETA for this? please release this soon. Cheers

aurelien78 commented 6 years ago

Great News !!

aurelien78 commented 6 years ago

Hi,

Any news about this feature ?

Thanks a lot.

Amazing work !

ervteng commented 5 years ago

This feature is still in the works - we'll communicate further when it is available in a future release. In the meantime, take a look at our OpenAI baselines gym wrapper. The PPO example (which uses OpenAI's PPO implementation) that is shown at the bottom of the page uses multiprocessing to spawn multiple Unity environments.

ervteng commented 5 years ago

Hi all, great news! (actually)

There is a parallel trainer available on our develop branch, which we will release on master very soon. Give it a go! Just add --num-envs=X to mlagents-learn to open more than one Unity environment at a time.

sunirisgrace commented 4 years ago

Hi, I have some questions about parrel training in different scenarios: 1) I wonder whether there is one brain or there are multiple brains after parrel training is finished. 2) Whether there is one policy or there are multiple policies during the process of parrel training. 3) How to implement parrel training in different scenarios. If I have two scenarios, should I write one line of command for training or I need to write two commands for training. such as: mlagents-learn < trainer- config-file > --num-envs=<2> --env = < env_name1 > --run-id = --env = < env_name2 > --run-id = --train. 4) During the process of the training for one scenario, we can see the value of the reward. How to differentiate the rewards for two scenarios if I train two scenarios concurrently.

ervteng commented 4 years ago

1) One brain. Parallel training is the same (in effect) of having multiple training areas/agents in one scene. 2) One policy per brain. If there's only one brain in the scene, there will only be one policy as well. 3) Parallel training is for multiples of the same env, so you only have to pass the --env once, with the --num-envs. Currently we don't support parallel training with different environments. 4) All scenarios need to be identical, and you'll see the average reward across all of the environments.

sunirisgrace commented 4 years ago

@ervteng On your official blog website, it mentioned that "Today, we are introducing the ability to train faster by having multiple concurrent instances of Unity on a multi-core machine". and " The changes we provide in v0.8 enable a training speedup of 5.5x on easy levels and up to 7.5x on harder levels by leveraging 16 Unity simulations. Generally speaking, the gains of utilizing multiple Unity simulations are greater for more complex levels and games." Does it mean that ML agents train 16 simulations in one scenario concurrently and finally get one policy to make agent be able to solve many levels?

ervteng commented 4 years ago

Yep! Our example in the blog ran fairly slowly, so parallelizing it produced many more samples per second. But you're still going to end with one policy.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.