-
Ubuntu 24.10
$ ./target/release/nvidia-tuner --index 0 --pairs 30:100
thread 'main' panicked at src/main.rs:23:86:
called `Result::unwrap()` on an `Err` value: "Failed to initialize NVML: a libl…
-
There are three NVML init calls in the daemonset; there should only be one init call using setup, and the init should be destroyed when the manager is stopped. There are several ways to achieve this; …
-
I am trying to compile dcgm-exporter tool (git version 965b2de86d647d6c4c3a9ebe0d66e7ebf46045f5), which throws compilation errors of go-dcgm:
```
(golang) root@9ecfb1571994:/mnt/dcgm-exporter# go ver…
-
**Output of the [info page](https://docs.datadoghq.com/agent/guide/agent-commands/#service-status)**
When installing NVML integration, getting the following error:
Loading Errors
=========…
-
I am trying to deploy LLama 3.1-8B instruct model NIM on Sagemaker as an endpoint following this notebook: https://github.com/NVIDIA/nim-deploy/blob/main/cloud-service-providers/aws/sagemaker/nim_llam…
-
**Describe the bug**
Using CUDA.jl on a nVidia Jetson, `CUDA.versioninfo()` gives the following error
```julia
[...]
Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6
Preferences:
- CUDA_Runtime_jll.local:…
CaG21 updated
3 weeks ago
-
Immediately after restart host may switch from GPU/CUDE device to CLANG and nothing can do except shutdown other peers to find who causing the issue.
On unhealthy host issue detected by basic health …
-
#### **Summary**
Customer reported that `nvidia-smi` stops working in Kubernetes pods with the error `Failed to initialize NVML: Unknown Error` after some time.
:green_circle: Worth noting, applic…
-
**Title:** NVIDIA Driver Failure on GPU Nodes: Driver/Library Version Mismatch Error
**Description:**
On the GPU nodes, the NVIDIA driver intermittently fails, causing the following error:
```s…
-
I imported faulthandler and got some info:
```
Jul 20 22:41:46 drache nvml-undervolt[86317]: Warning: Persistence mode is already enabled - make sure no oth>
Jul 20 22:43:35 drache nvml-undervo…