pierreHaslee commented 1 year ago

GPU not used

I don't see any CUDA GPU usage when training a ppo policy.

I have properly installed pytorch and cudatoolkit using conda.

torch.cuda.is_available()

returns True and I can display my GPU device name correctly, so the error should not be from this side.

To Reproduce

Steps to reproduce the behavior:

My command line:

mlagents-learn ML-Agents\trainer_conf.yml --time-scale=20 --run-id recuriculum_boosts --resume --torch-device cuda

My training conf file: trainer_conf .txt (uploaded as txt but is a yaml file as intended)

Screenshots

training start:

cmdline

It is training properly (rewards are bad because I tweaked my config files to try and debug, but I was able to train my model):

cmdTrain

My environment looks like this (not important):

myEnv

here is my GPU usage while training (pardon my french):

usageTraining

as you can see there is no CUDA mention, and when doing nvidia-smi, all processes show N/A as GPU usage.

when I press play in Unity (no training), I actually have a higher GPU usage:

usagePlaying

I hope there is a fix because I'd really appreciate the speed boost from my RTX.

I'd also like to thank the devs behind this RL implementation in unity as it is simply amazing, I am amazed by how well it works.

Environment :

Unity Version: Unity 2021.3.14f1
OS + version: Windows 10
ML-Agents version: 2.2.1-exp.1
Torch version: torch 1.7.1+cu110
Environment: my own env, I tested it as well with GridWorld with the same results.

Eelam commented 1 year ago

Did you install CUDA tool's from Nvidia? If so, check your CUDA tool's version with the Torch version: torch 1.7.1+cu110. I had an issue with the 'curiosity' parameter. I fixed it by installing the correct version of the CUDA tools: https://github.com/Unity-Technologies/ml-agents/issues/5793

miguelalonsojr commented 1 year ago

We just cut release 20, can you update and give it a try?

pierreHaslee commented 1 year ago

I updated to release20 along with the new unity extensions and it did not change. I also installed CUDA 11.6 from NVIDA, tried multiple versions of pytorch including the latest and nightly, nothing seems to be working.

edit: Also, I do not have any gym-unity package in this new installation, as I can't find the recommanded version anywhere. gym-unity==0.30. is required for mlagents release20 but it is nowhere to be found.

pierreHaslee commented 1 year ago

Well I am sorry to have wasted your time, but I didn't see there was a Cuda option on the windows Task Manager Performances section...

usageHere

There is therefore no bug and everything works fine.

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

GPU not used release19 #5839

GPU not used

To Reproduce

Screenshots

training start:

It is training properly (rewards are bad because I tweaked my config files to try and debug, but I was able to train my model):

My environment looks like this (not important):

here is my GPU usage while training (pardon my french):

when I press play in Unity (no training), I actually have a higher GPU usage: