Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.27k stars 4.17k forks source link

Request for mlagents processing benchmark for visual observations #1581

Closed markpwoodward closed 5 years ago

markpwoodward commented 5 years ago

I think it would be good to have a processing benchmark for mlagents. We could then try to improve it through driver/cuda/unity flags and code optimization.

An initial benchmark could be the GridWorld environment that ships with the SDK.

grid_world_speed_test.py:

import mlagents.envs

env = mlagents.envs.UnityEnvironment(file_name="ml-agents/ml-agents/mlagents/envs/GridWorld")
env.reset()

for i in range(1000):
  env.step([0])

Here are my initial numbers:

GPU based $ time python grid_world_speed_test.py

Xvfb based $ time xvfb-run -s "-screen 0 1024x768x24" python grid_world_speed_test.py

A concerning trend is that speeds seem to get slower as the GPU gets more powerful.

eshvk commented 5 years ago

I think this is a good idea, we have been thinking about something similar, I will keep you posted when we release something related.

markpwoodward commented 5 years ago

I think this is a good idea, we have been thinking about something similar, I will keep you posted when we release something related.

eshvk commented 5 years ago

I am going to reopen this issue so that we can use this to update once we get some more transaction on this.

eshvk commented 5 years ago

@markpwoodward I was re-reading your results (which are very counter intuitive). One thought on the GPU computation: How many vCPUs does your gcloud machine have? I noticed that the processing power of the (GCP) CPU chip is lower than both of the lenovos?

I have a hunch that if you had identical performance CPUs on the GCP machine as the Lenovo machines, you should get equal performance. This assumes that the V100 is not being used efficiently for game rendering by Unity (and we can ignore the use of the GPU for learning for this discussion b/c we are doing PPO with a single Actor configuration which doesn't fill up the GPU buffer).

markpwoodward commented 5 years ago

@eshvk oops, I didn't mean to close this issue. The gcloud machine has 96 vCPUs and 8 V100's :). and nothing else was running but this test. I agree with your hunch about the CPU being the limiting factor. Performance seems to track CPU speed, with a minor bump when rendering on the GPU vs. CPU for each platform. [Edit] Actually, not quite, the think station CPU has higher base and boost, but maybe the boost is disabled... My 2cents is to focus on the cloud systems which are more standardized[\Edit]

The example doesn't do any learning (PPO or otherwise). At least that is my intention. I was thinking that this benchmark would be just rendering, not training.

My 2 cents on the visual benchmark environment is that it should have a "batchSize" resetParameter, which dynamically creates Areas. Number of Areas is an important tradeoff, e.g. for a batch size of 32 you could run 8 environments, each with a 4 Areas, or 32 environments, each with 1 Area. And processing time doesn't grow 1:1 with the number of Areas. I did this for my own environment and found the sweet spot.

eshvk commented 5 years ago

@markpwoodward With 96 vCPUs (IIRC, each vCPU corresponds to 1 hyperthread), I suppose what this means is that Unity is exploiting base cycle speed where the Lenovos are doing better. On the GPU, another thing that would be worthwhile investigating is the percentage of the GPU being used at any time. I am guessing a very small percentage of the V100 is being used.

Yes, I have been looking at benchmarking in the context of GCP with dockerized containers in isolated VMs b/c there are way more hidden parameters to deal with on on consumer laptops.

markpwoodward commented 5 years ago

Also, my assumption was that rendering on the GPU would really shine when I increased the number of Areas vs. CPU rendering, but I did not experience that. Another plus for a benchmark that can change the number of Areas.

markpwoodward commented 5 years ago

@eshvk Correct. It is only ~3% of the V100.

nikola-j commented 5 years ago

Does Xvfb work with environments that use a camera? When I try to run my env I get:

Xlib:  extension "NV-GLX" missing on display ":99".
Traceback (most recent call last):
    "The Unity environment took too long to respond. Make sure that :\n"
mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
     The environment does not need user interaction to launch
     The Academy and the External Brain(s) are attached to objects in the Scene
     The environment and the Python interface have compatible versions.
awjuliani commented 5 years ago

@nikola-j

Our docker set-up uses xvfb specifically for environments that use cameras. See here for the dockerfile: https://github.com/Unity-Technologies/ml-agents/blob/master/Dockerfile

markpwoodward commented 5 years ago

@nikola-j Also take a look at the following issue + solution: https://github.com/Unity-Technologies/ml-agents/issues/1574

nikola-j commented 5 years ago

@awjuliani @markpwoodward Thanks guys, the #1574 solution worked!

harperj commented 5 years ago

Thanks for the suggestion. I've added it to our internal tracker with the ID MLA-73. I’m going to close this issue for now, but we’ll ping back with any updates.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.