facebookresearch / habitat-sim

A flexible, high-performance 3D simulator for Embodied AI research.
https://aihabitat.org/
MIT License
2.45k stars 404 forks source link

Support for clusters with AMD gpus? #2402

Open lasuomela opened 1 month ago

lasuomela commented 1 month ago

Hello,

I would like to run Habitat on a cluster with AMD Instinct MI250x GPU's. However, at least the version installed with conda (habitat-sim=0.3.0 [withbullet headless]) fails to run with the error

Platform::WindowlessEglApplication::tryCreateContext(): unable to find CUDA device 0 among 9 EGL devices in total

From the docs it is not completely clear if the conda packages are built with cuda support, so I'm wondering if building Habitat from source without cuda would work with the AMD gpu's.

Now, reading through https://github.com/facebookresearch/habitat-sim/issues/1511 I get the impression that Habitat actually depends on the Nvidia OpenGL driver. Is this so, or is there a way to run Habitat without Nvidia gpu's?

lasuomela commented 4 weeks ago

I'm answering myself after a bit of digging. Seems like AMD GPU's / ROCm do not provide functionality to align the GPU 'CUDA device index' with the GPU 'EGL device index'.

See: https://doc.magnum.graphics/magnum/classMagnum_1_1Platform_1_1WindowlessEglApplication.html#Platform-WindowlessEglApplication-device-selection

This is required in the Magnum library to place the simulator instance on the same GPU as PyTorch. So, enabling AMD support seems unlikely.

Could @erikwijmans comment, since I see you wrote the Magnum code for Nvidia GPU device selection? Would similar functionality be possible with the mesa driver?

Br, Lauri