Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.2k stars 4.16k forks source link

Headless Rendering on GPU via SDL+CUDA without X-server #1050

Closed rmst closed 5 years ago

rmst commented 6 years ago

Hi,

Applications based on SDL (e.g. all Unreal engine apps) seem to render on Nvidia GPUs without X-server now (https://github.com/carla-simulator/carla/issues/225). Unity3D seems to support SDL as well according to changelogs (e.g. https://unity3d.com/de/unity/whats-new/unity-2017.3.0) but I couldn't make it work when applying the simple trick from the link above that works for Unreal-based applications. Any ideas how to make that work?

Best,

Simon

Edit: The way to get SDL apps to render without X and through CUDA instead is to:

export SDL_VIDEODRIVER=offscreen
export SDL_HINT_CUDA_DEVICE=0
shihzy commented 6 years ago

Hi Simon - I don't believe we have support for this yet in ML-agents. Can you provide any other information or logs on what you are seeing when you try?

rmst commented 6 years ago

I tried this a month ago and I think the binary just crashed with a segfault as usual when there is no x-server present. I had hoped that there some secret compile flag that we could set to compile the binary with SDL but that's not the case? What do those changelogs refer to then when they say sth like "Linux: Upgraded SDL to version 2.05"?

On Thu, Aug 2, 2018, 11:27 Jeffrey Shih notifications@github.com wrote:

Hi Simon - I don't believe we have support for this yet in ML-agents. Can you provide any other information or logs on what you are seeing when you try?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Unity-Technologies/ml-agents/issues/1050#issuecomment-409966038, or mute the thread https://github.com/notifications/unsubscribe-auth/AHniLADzWH-CFRb-fKaFY3kHSuwWBfMaks5uMxp1gaJpZM4Vra4v .

shihzy commented 6 years ago

Thanks @rmst we have a few folks trying to resolve this in our end. Will keep you posted on the progress.

rmst commented 6 years ago

Please do! Thanks :)

On Thu, Aug 2, 2018, 12:59 Jeffrey Shih notifications@github.com wrote:

Thanks @rmst https://github.com/rmst we have a few folks trying to resolve this in our end. Will keep you posted on the progress.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Unity-Technologies/ml-agents/issues/1050#issuecomment-409996871, or mute the thread https://github.com/notifications/unsubscribe-auth/AHniLNxj8VE2hMw0wSdUFSNAtpz0V6Prks5uMy_tgaJpZM4Vra4v .

rmst commented 6 years ago

Hi,

I tested it again and on the same headless machine I can successfully run my Unity application when X is running but it fails without X (with the above environment variables set), getting the following console output: https://gist.github.com/rmst/61a94c6a3d2704c3be593bbe67b1b1c2

There is also this Twitter thread (https://twitter.com/natosha_bard/status/796466784803192832) from two years ago in which it sounds as if for Unity on Linux, all X11 depencies have been removed and Unity relies solely on SDL2 now. Also, I can confirm that Unity at least reacts to SDL_VIDEODRIVER=offscreen -- by crashing even when an X server is running :P. On the other hand the error message above contains references to /usr/lib64/libX11.so.6.3.0 which kinda looks like a dependency on X.

As for SDL_VIDEODRIVER=offscreen it seems like this is pretty new. It isn't even listed on the SDL site https://wiki.libsdl.org/FAQUsingSDL. The only reason I expected it to work in Unity is because it works in Unreal.

It would be really great to get more infos about this from you. Perhaps it's only a little thing that I missed that could make it work (maybe @natosha knows more?).

Best,

Simon

xiaomaogy commented 6 years ago

@unityjeffrey Do you have any more information on this?

rmst commented 6 years ago

I'd be curious as well. Also, I don't think the "help wanted" label is appropriate here. This question is about the capabilities / internals of the engine and can only be resolved by Unity3D people.

aPere3 commented 5 years ago

Any update on this topic ? I would also be very interested to run ml-agents without having to start a x server.

On my side, when trying on our cluster, setting the environment variables:

export SDL_VIDEODRIVER=offscreen
export SDL_HINT_CUDA_DEVICE=0

Seems to have an effect as the unity binary starts without segfaults, but python is unable to connect to it:

Found path: /home/apere/pyramid/pyramid.x86_64
Mono path[0] = '/home/apere/pyramid/pyramid_Data/Managed'
Mono config path = '/home/apere/pyramid/pyramid_Data/MonoBleedingEdge/etc'
Preloaded 'ScreenSelector.so'
Preloaded 'libgrpc_csharp_ext.x64.so'
Preloaded 'liblibtensorflow.so'
Preloaded 'libtensorflow_framework.so'
Logging to /home/apere/.config/unity3d/Unity Technologies/Unity Environment/Player.log
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/apere/anaconda3/envs/py-3.6-gpu/lib/python3.6/site-packages/gym_unity/envs/unity_env.py", line 34, in __init__
    self._env = UnityEnvironment(environment_filename, worker_id)
  File "/home/apere/anaconda3/envs/py-3.6-gpu/lib/python3.6/site-packages/mlagents/envs/environment.py", line 67, in __init__
    aca_params = self.send_academy_parameters(rl_init_parameters_in)
  File "/home/apere/anaconda3/envs/py-3.6-gpu/lib/python3.6/site-packages/mlagents/envs/environment.py", line 527, in send_academy_parameters
    return self.communicator.initialize(inputs).rl_initialization_output
  File "/home/apere/anaconda3/envs/py-3.6-gpu/lib/python3.6/site-packages/mlagents/envs/rpc_communicator.py", line 61, in initialize
    "The Unity environment took too long to respond. Make sure that :\n"
mlagents.envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
     The environment does not need user interaction to launch
     The Academy and the External Brain(s) are attached to objects in the Scene
     The environment and the Python interface have compatible versions.

The Player.log is:

Desktop is 0 x 0 @ 0 Hz
Vulkan detection: 0

Any idea how I could manage to start the environment ?

harperj commented 5 years ago

Hi all, Unity does have an explicit dependency on X at this time and we can't provide any specifics on if/when this might change.

The docs on AWS training or CPU rendered Docker training might be useful for working around this.

Since this isn't actionable at this time and the conversation has become stale I'm going to close the issue, but feel free to re-open or create a new issue if you continue to have trouble.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.