HPCE / hpce-2017-cw6

2 stars 17 forks source link

Broken NVDIA driver on g3.4xlarge #39

Closed dc3315 closed 6 years ago

dc3315 commented 6 years ago

Hi, I have launched a g3.4x large machine, and installed all the libraries that I know will be installed as part of the image. (i.e NVDIA cuda-dev and the nvdia-toolkit for nvcc)

and I encounter the following error: modprobe: ERROR: could not insert 'nvidia_current': No such device

This seems like a driver problem and I have done a bit of investigating but unfortunately I can't fix the issue. Anyone else using NVDIA have the same problem? Any ideas?

Many thanks.

m8pple commented 6 years ago

Sorry about that, you're not supposed to have to deal with things like that. So that's my fault - I encountered the same problem over the summer, and fixed it without writing it down. So it is fixed in my long-term AMI, but didn't make into the public AMI.

The problem is that the g3 GPUs are quite new, so their drivers aren't in the stable set of packages for debian. However, they are available in backports (a set of packages from the new distribution which have been back-ported so they can be used in the older distribution).

The manual steps to take are:

1 - Edit nano /etc/apt/sources.list to add the back-ported sources.

2 - Update the package list

    sudo apt-get update

3 - Install the nvidia driver from backports

     sudo apt-get install -t jessie-backports nvidia-driver

4 - Wait - this takes about a minute.

After it completes you should have a working OpenCL driver (I tested it using a few programs).

I need to update the AMI, though will need to do that in a more controlled way as it takes a bit of time.