Concurrent training v0.8

Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

https://unity.com/products/machine-learning-agents

Other

16.93k stars 4.13k forks source link

Concurrent training v0.8 #1948

Closed caioc2 closed 5 years ago

caioc2 commented 5 years ago

Hi,

I've tried to use the concurrent training in v0.8 but I`m not sure if the console output time/steps is correct.

For instance, I`m running a machine with 4c/8t, if I set --num-evns=1 it uses about ~27% of CPU and the console shows it taking about 10 seconds to complete 1000 steps.

If I set --num-evns=8 it uses about 90% of the CPU and the console shows it taking about 60 seconds to complete 1000 steps.

In the other hand the generated csv with statistics show the opposite:

For 1 env running I have "Time for last experience collection" ~12 seconds and for 8 envs running I have "Time for last experience collection" ~3 seconds which is what I would expect for concurrent training.

So I believe the console statistics is not consistent with the csv or it needs a better explanation of what it is showing.

sterlingcrispin commented 5 years ago

@caioc2 it seems no matter how many environments I set my console outputs the progress as: INFO:mlagents.trainers: run-id-name-0 this makes me think its the zeroth instance and its not actually running concurrent sessions but only one, are you seeing the same thing or do you see run-id-name-1 and 2 and so on?

caioc2 commented 5 years ago

So far I believe it is running concurrent in the sense that multiple instances are running, but the statistics are "individual", like the steps are not the total steps of all instances, but each instance have run for x steps.

So the concurrency is not exactly what one expects, and running with a different number of envs gives you totally different training setup, hence to compare results you have to run with the same number of envs.

This kind of implementation is some what lacking for research, a better way of doing it would be in a lower level, maybe implementing it directly in the unity's agent but surely it would require a lot more work.

roboserg commented 5 years ago

@caioc2 wait I thought concurrent training of many instances would work exactly like running several instances of the training area within one editor window - many agents, one brain. Even in the blog post, they are talking about being able to speed up training 5-7x times with concurrent training. Are you sure those concurrent instances are independent? It doesn't align with what they say in the blog post.

caioc2 commented 5 years ago

No, they are not individual in this sense. Every experience gathered goes to training the same brain. What I said is that the statistics of running with one env cannot be compared with another running #n envs because you are not just making more calculations at once, but doing it somewhat different.

The speedup is highly dependent of your environment. To get anything near that 5-7x, the simulation time (cost) must be much greater than the network training time.

I myself could gain about 30~50% speedup.

roboserg commented 5 years ago

@caioc2 do you absolutely need to build the environment to use concurrent? Do you run several training areas within each individual training area?

caioc2 commented 5 years ago

Yes you need to build it, at least from what I know.

There is a difference of using various training areas in the same scene, and running various envs. You can even use both together.

Do you run several training areas within each individual training area?

I'm not sure if I understood what you meant

roboserg commented 5 years ago

I meant several training areas within one unity scene / editor. How do you balance it vs several editor instances of unity

caioc2 commented 5 years ago

In my case it was trial and error. I put enough training areas in the scene until it saturates one core and run as many instances as the number of cores/HT.

PS: I'm using the built game not the editor for concurrency.

roboserg commented 5 years ago

Hmm, I run 10 training areas within one unity editor and it uses all my 4 cores (+4 HT):

https://puu.sh/Dte4U/3900d51b58.png

Looks like putting multiple scenes with several instances of the game won't make sense in my case? I am having a i7 6700k with 4 cores and 4 HT

caioc2 commented 5 years ago

The operational systems handles which core it will run, and it isnt fixed which one. I this case when one says it is using "one core" is because the processor usage does not top near the 100%. For instance in a quad-core if you usage is about 25~30% one would assume it is using one core, not that one of them would be at 100% and others at 0%, unless you explicit sets it in your system. (Of course there are other cases in which it can occur but I'm keeping it simple)

In your case it is topping 66%, which is not bad but with 2 instances running could reach 90~100% and you would have some gains in speed.

Another point of CPU usage is what you are doing and using in unity. In my case I dont use physics or anything heavy which is already multi-threaded by unity internals itself, everything is made on my own C# script, that's why it usually sits at 25~30% CPU usage with a single scene/executable when simulating. When training it uses everything, as tensorflow is high optimized, and that's why my total gains are not much, most of time it is training and not simulating.

TL DR

if you are not near 100% CPU usage, you can try to use concurrency and have some gains.

roboserg commented 5 years ago

Thanks. I use some physics and Unity Editor uses only 4-8% of the total CPU time. Will play around headless mode + several instances.

harperj commented 5 years ago

Hi @caioc2 -- the reason the output in the console is different from the output in the CSV is that the console is counting in "steps", which is a unit of a single environment step for all environments. So if you have 8 copies of your environment a single "step" accounts for all environments moving forward by a single step. The CSV output is in terms of "experiences", which refer to a single experience (tuple of action, observation, reward) from a single environment.

I agree that we could make this easier to understand, but for now I hope this clears up what's going on.

caioc2 commented 5 years ago

My concern, which may not be relevant for other applications, is about the architecture. Implementing anything based on steps, for example a custom curriculum, will present different results depending of the number of environments. This dependency in my anecdotal case is not welcome.

Anyway, it is completely clear now, Thanks!

harperj commented 5 years ago

@caioc2 loud and clear on the issues with steps -- we're discussing internally how best to fix this in an upcoming release.

caioc2 commented 5 years ago

Thanks, I'm glad to hear about that.

xiaomaogy commented 5 years ago

Thank you for the discussion. We are closing this issue due to inactivity. Feel free to reopen it if you’d like to continue the discussion though.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.