Closed bounlu closed 2 years ago
Then nvitop stopped working with the error:
NVML ERROR: RM has detected an NVML/RM version mismatch.
@bounlu The command sudo apt install nvidia-cuda-toolkit
modifies your NVIDIA driver. You need to offload and reload the NVIDIA kernel module. The easiest and safest way is to restart your machine.
If you are sure that currently there are no processes (including the Desktop GUI) using the GPU, you can try the following command without a restart:
sudo modprobe -r -f $(sudo lsmod | grep '^nvidia' | awk '{ print $1 }')
nvidia-smi
I already restarted the server, which didn't help.
nvidia-smi
is not installed, when I try to install, I am afraid it will conflict and break things again:
$ nvidia-smi
Command 'nvidia-smi' not found, but can be installed with:
sudo apt install nvidia-utils-390 # version 390.154-0ubuntu0.22.04.1, or
sudo apt install nvidia-utils-450-server # version 450.203.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470 # version 470.141.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470-server # version 470.141.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-510 # version 510.85.02-0ubuntu0.22.04.1
sudo apt install nvidia-utils-510-server # version 510.85.02-0ubuntu0.22.04.1
sudo apt install nvidia-utils-515 # version 515.65.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-515-server # version 515.65.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-418-server # version 418.226.00-0ubuntu4
$ sudo pip3 install --upgrade nvitop
Requirement already satisfied: nvitop in /usr/local/lib/python3.10/dist-packages (0.8.1)
Requirement already satisfied: psutil>=5.6.6 in /usr/local/lib/python3.10/dist-packages (from nvitop) (5.9.2)
Requirement already satisfied: cachetools>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from nvitop) (5.2.0)
Requirement already satisfied: nvidia-ml-py<11.500.0a0,>=11.450.51 in /usr/local/lib/python3.10/dist-packages (from nvitop) (11.495.46)
Requirement already satisfied: termcolor>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from nvitop) (2.0.1)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
$ nvitop
NVML ERROR: RM has detected an NVML/RM version mismatch.
$ uname -r
5.15.0-48-generic
$ sudo apt install nvidia-utils-515-server
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
libaccinj64-11.5 libasyncns0 libbabeltrace1 libboost-regex1.74.0 libcub-dev libcublas11 libcublaslt11 libcudart11.0 libcufft10 libcufftw10 libcupti-dev libcupti-doc libcupti11.5 libcurand10
libcusolver11 libcusolvermg11 libcusparse11 libdebuginfod-common libdebuginfod1 libdouble-conversion3 libegl-dev libegl-mesa0 libegl1 libflac8 libgail-common libgail18 libgbm1 libgl-dev
libgl1-mesa-dev libgles-dev libgles1 libgles2 libglvnd-core-dev libglvnd-dev libglx-dev libgtk2.0-0 libgtk2.0-bin libgtk2.0-common libipt2 libnppc11 libnppial11 libnppicc11 libnppidei11
libnppif11 libnppig11 libnppim11 libnppist11 libnppisu11 libnppitc11 libnpps11 libnvblas11 libnvjpeg11 libnvrtc-builtins11.5 libnvrtc11.2 libnvtoolsext1 libnvvm4 libogg0 libopengl-dev
libopengl0 libopus0 libpthread-stubs0-dev libpulse0 libqt5core5a libqt5dbus5 libqt5network5 libsndfile1 libsource-highlight-common libsource-highlight4v5 libtbb-dev libtbb12 libtbbmalloc2
libthrust-dev libvdpau-dev libvdpau1 libvorbis0a libvorbisenc2 libwayland-server0 libx11-dev libxau-dev libxcb-icccm4 libxcb-image0 libxcb-keysyms1 libxcb-render-util0 libxcb-util1
libxcb-xinerama0 libxcb-xkb1 libxcb1-dev libxdmcp-dev libxkbcommon-x11-0 mesa-vdpau-drivers node-html5shiv nsight-compute nsight-compute-target nvidia-cuda-gdb nvidia-cuda-toolkit-doc
nvidia-opencl-dev ocl-icd-libopencl1 ocl-icd-opencl-dev opencl-c-headers opencl-clhpp-headers openjdk-8-jre qttranslations5-l10n vdpau-driver-all x11proto-dev xorg-sgml-doctools xtrans-dev
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
libnvidia-compute-515-server
Suggested packages:
nvidia-driver-515-server
The following packages will be REMOVED:
libcuinj64-11.5 libnvidia-compute-495 libnvidia-compute-510 libnvidia-ml-dev nsight-systems nsight-systems-target nvidia-cuda-dev nvidia-cuda-toolkit nvidia-profiler nvidia-visual-profiler
The following NEW packages will be installed:
libnvidia-compute-515-server nvidia-utils-515-server
0 upgraded, 2 newly installed, 10 to remove and 8 not upgraded.
Need to get 365 kB/50.3 MB of archives.
After this operation, 2,733 MB disk space will be freed.
Do you want to continue? [Y/n] n
Abort.
I installed the
nvitop
viapip3
as described and it worked fine.
nvidia-smi
is not installed
How do you install the NVIDIA driver? By .run
file or apt
? If your install the driver via .run
file, you should uninstall it via .run
file first.
If you install the NVIDIA driver via apt
, try:
dpkg-query --show --showformat='${binary:Package} ${Status}\n' |
grep -v deinstall | awk '{ print $1 }' | grep nvidia-driver |
xargs -L 1 sudo apt remove --purge
sudo apt autoremove
to install your driver first. Then:
git clone --depth=1 https://github.com/XuehaiPan/nvitop.git
cd nvitop
sudo chvt 3
./install-nvidia-driver.sh
see NVIDIA driver installer for more details.
Following these fixed it. Thanks a million.
I installed the
nvitop
viapip3
as described and it worked fine.Then I installed
nvcc
via:sudo apt install nvidia-cuda-toolkit
Then
nvitop
stopped working with the error:NVML ERROR: RM has detected an NVML/RM version mismatch.
How to make both work?