NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.68k stars 2.44k forks source link

Can't launch NeMo containers with CUDA support #9268

Closed drunkinlove closed 2 months ago

drunkinlove commented 4 months ago

Describe the bug After pulling the image and starting the container, I get the following error:

ERROR: The NVIDIA Driver is present, but CUDA failed to initialize.  GPU functionality will not be available.
   [[ Forward compatibility was attempted on non supported HW (error 804) ]]

Steps/Code to reproduce bug

The command I use to start the container:

docker run -it --gpus '"device=1,2"' --ulimit stack=67108864 --runtime nvidia nvcr.io/nvidia/nemo:24.03.01.framework

I've also tried the nvcr.io/nvidia/nemo:dev.framework image.

Expected behavior

CUDA should initialize properly.

Environment overview (please complete the following information)

GPU: RTX 2080 Ti NVIDIA driver version: 535.154.05 CUDA version: 12.2 OS: Ubuntu 20.04.5 LTS (amd64)

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.