NVIDIA / cloud-native-stack

Run cloud native workloads on NVIDIA GPUs
Apache License 2.0
126 stars 51 forks source link

NVIDIA Cloud Native Stack

Introduction

NVIDIA Cloud Native Stack (CNS) is a collection of software to run cloud native workloads on NVIDIA GPUs. NVIDIA Cloud Native Stack is based on Ubuntu/RHEL, Kubernetes, Helm and the NVIDIA GPU and Network Operator.

Interested in deploying NVIDIA Cloud Native Stack? This repository has install guides for manual installations and ansible playbooks for automated installations.

Interested in a pre-provisioned NVIDIA Cloud Native Stack environment? NVIDIA LaunchPad provides pre-provisioned environments so that you can quickly get started.

Objective

Life Cycle

When NVIDIA Cloud Native Stack batch is released, the previous batch enters maintenance support and only receives patch release updates. All prior batches enter end-of-life (EOL) and are no longer supported and do not receive patch updates.

Note: Upgrades are only supported from previous batch to latest batch.

Batch Status
24.8.1 Generally Available
24.5.0 Maintenance
24.3.0 and lower EOL

For more information, Refer Cloud Native Stack Releases

Component Matrix

Cloud Native Stack Batch 24.8.0 (Release Date: 20 August 2024)

CNS Version 13.1 12.2 11.3
Platforms
  • NVIDIA Certified Server (x86 & arm64)
  • DGX Server
  • NVIDIA Certified Server (x86 & arm64)
  • DGX Server
  • NVIDIA Certified Server (x86 & arm64)
  • DGX Server
Supported OS
  • Ubuntu 22.04 LTS
  • RHEL 8.8
  • DGX OS 6.2(Ubuntu 22.04 LTS)
  • Ubuntu 22.04 LTS
  • RHEL 8.8
  • DGX OS 6.2(Ubuntu 22.04 LTS)
  • Ubuntu 22.04 LTS
  • RHEL 8.8
  • DGX OS 6.2(Ubuntu 22.04 LTS)
Containerd 1.7.20 1.7.20 1.7.20
NVIDIA Container Toolkit 1.16.2 1.16.2 1.16.2
CRI-O 1.30.2 1.29.6 1.28.8
Kubernetes 1.30.2 1.29.6 1.28.12
CNI (Calico) 3.27.4 3.27.4 3.27.4
NVIDIA GPU Operator 24.6.2 24.6.2 24.6.2
NVIDIA Network Operator 24.4.1(x86 Only) 24.4.1(x86 Only) 24.4.1(x86 Only)
NVIDIA Data Center Driver 550.90.07 550.90.07 550.90.07
Helm 3.15.3 3.15.3 3.15.3

Note: To Previous Cloud Native Stack release information can be found here

NOTE: Cloud Native Stack versions are available with the master branch but it's recommend to use the specific branch.

Software

CNS Version 13.1 12.2 11.3
MicroK8s 1.30 1.29 1.28
KServe
0.13

  • Istio: 1.20.4
  • Knative: 1.13.1
  • CertManager: 1.9.0

0.13

  • Istio: 1.20.4
  • Knative: 1.13.1
  • CertManager: 1.9.0

0.13

  • Istio: 1.20.4
  • Knative: 1.13.1
  • CertManager: 1.9.0
LoadBalancer MetalLB: 0.14.5 MetalLB: 0.14.5 MetalLB: 0.14.5
Storage NFS: 4.0.18
Local Path: 0.0.26
NFS: 4.0.18
Local Path: 0.0.26
NFS: 4.0.18
Local Path: 0.0.26
Monitoring Prometheus: 61.3.0
Elastic: 8.14.1
Prometheus: 61.3.0
Elastic: 8.14.1
Prometheus: 61.3.0
Elastic: 8.14.1

Getting Started

Prerequisites

Please make sure to meet the following prerequisites to Install the Cloud Native Stack

Installation

Run the below commands to clone the NVIDIA Cloud Native Stack.

git clone https://github.com/NVIDIA/cloud-native-stack.git
cd cloud-native-stack/playbooks

Update the hosts file in playbooks directory with master and worker nodes(if you have) IP's with username and password like below

nano hosts

[master]
<master-IP> ansible_ssh_user=nvidia ansible_ssh_pass=nvidipass ansible_sudo_pass=nvidiapass ansible_ssh_common_args='-o StrictHostKeyChecking=no'
[nodes]
<worker-IP> ansible_ssh_user=nvidia ansible_ssh_pass=nvidiapass ansible_sudo_pass=nvidiapass ansible_ssh_common_args='-o StrictHostKeyChecking=no'

Install the NVIDIA Cloud Native Stack stack by running the below command. "Skipping" in the ansible output refers to the Kubernetes cluster is up and running.

bash setup.sh install

For more Information about customize the values, please refer Installation

Topologies

NOTE: (Cloud Native Stack does not allow the deployment of several control plane nodes)

Troubleshooting

Troubleshoot CNS installation issues

Getting help or Providing feedback

Please open an issue on the GitHub project for any questions. Your feedback is appreciated.

Useful Links