-
# A step-by-step pratical guide for deploying NVIDIA GPUs on Kubernetes
This blog post shows how to build a Kubernetes (k8s) cluster that takes GPU resources into account.
[https://mickael-baron.fr/…
-
Hello, NVIDIA Team.
I'm facing an issue while configurating `dcgm-exporter` from `gpu-operator`. I have 2 Kubernetes clusters - one is a cluster where GPU jobs run, and the other is used for managing…
-
### What happened + What you expected to happen
1. Operating environment:
python 3.6.5
ray 2.3.1
kubernetes 1.18
2. Bug description
A ray cluster with 1 head node + 800 worker node…
-
**What I'd like:**
NVIDIA time-slicing landed (see #2347) in [Bottlerocket 1.25](https://github.com/bottlerocket-os/bottlerocket/blob/develop/CHANGELOG.md#v1250-2024-10-15). While a step forwar…
-
### Feature Description
## Problem Statement
In Kubernetes environments using Kueue for resource management and KubeStellar for multi-cluster orchestration, there's a need for dynamic resource all…
-
### TL;DR
Upgrading from one Module version to Other Module Version causing Node pool to recreate
### Expected behavior
When we upgrade GKE module versions we are seeing breaking changes wher…
-
**Describe the bug**
Following the official examples [here](https://docs.nvidia.com/spark-rapids/user-guide/latest/examples.html#ref-sec-profcmd-cli-samples), cannot profile event logs from Dataproc …
-
### What is the version?
3.1.8-3.1.5-ubuntu20.04
### What happened?
We have been using gpu-operator in Kubernetes cluster. Gpu-operator helm-chart version: gpu-operator-v23.6.1 Kubernetes version: …
-
Hey all,
"I have an on-premises Kubernetes cluster with multiple nodes. One of these nodes is equipped with two different GPU models:
NVIDIA GeForce RTX 3090 and NVIDIA GeForce RTX 4090
When I SSH i…
-
**What would you like to be added**
Kubernetes 클러스터를 통한 AI 워크로드 실행 지원을 위한,
GPU 노드(spec) 지원
**Why is this needed**
많은 사람들이 이미 K8s로 머신러닝을 수행하고 있으므로,
GPU 노드가 포함된 K8s 제공이 기본 기능이라고 할 수 있음…