Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.12k stars 4.15k forks source link

How to take advantage of GPU training on a Linux Server (AWS) and Docker #5677

Closed meiemari closed 2 years ago

meiemari commented 2 years ago

Hi team,

I am running a training via Docker on a Linux Server (headless), like this:

mlagents-learn trainer.yml --env=env-dist/UnityApp.x86_64 --run-id=$RUN_ID --force --debug --no-graphics --num-envs=10

The training time is significantly longer (somethimes 2 - 3 times) compared to training locally on a Windows machine (no Docker).

I am training in the Cloud on AWS G instances which have at least: g4dn.xlarge 1 (GPU) 4 (vCPU) 16 (GB Memory). My local laptop beats the server every time!

Am I missing something?

My Unity setup: Unity version: 2020.3.20f1 ML Agents version: 1.0.8 Thanks in advance!

meiemari commented 2 years ago

Additional information. I am using vector observations only, but still training in the cloud is much slower than on my local laptop. Any hints as to how I can solve this riddle are appreciated :) I am running headless, using xvfb (Linux Server with Docker). I have read through older posts and it seems like using xvfb could be a bottleneck https://github.com/Unity-Technologies/ml-agents/issues/1846.

Is there an alternative to xfvb that I can use by now?

kenminglee commented 2 years ago

Hey @meiemari, from my little experience with mlagents, I would like to suggest a few potential improvements to your setup that could improve training speed:

I have limited experience in mlagents, and in machine learning in general, so please take these suggestions with a grain of salt!

doctorpangloss commented 2 years ago

Your docker container must be started with GPUs. xvfb-run started processes cannot use the GPU for graphics.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 42 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.