NetworkUnavailable False Tue, 21 May 2024 00:17:41 +0800 Tue, 21 May 2024 00:17:41 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 21 May 2024 11:41:24 +0800 Tue, 21 May 2024 00:17:52 +0800 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 192.168.2.145
Hostname: 416a100
Capacity:
cpu: 80
ephemeral-storage: 459819088Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263739228Ki
nvidia.com/gpu: 0
pods: 110
Allocatable:
cpu: 80
ephemeral-storage: 447312008456
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263739228Ki
nvidia.com/gpu: 0
pods: 110
2.nvidia-smi
nvidia-smi
Tue May 21 11:44:30 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB Off | 00000000:36:00.0 Off | 0 |
| N/A 42C P0 46W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB Off | 00000000:37:00.0 Off | 0 |
| N/A 42C P0 45W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA RTX A6000 Off | 00000000:9D:00.0 Off | Off |
| 30% 37C P8 22W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA RTX A6000 Off | 00000000:9E:00.0 Off | Off |
| 30% 36C P8 28W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
3.sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Tue May 21 03:44:52 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB Off | 00000000:36:00.0 Off | 0 |
| N/A 42C P0 46W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB Off | 00000000:37:00.0 Off | 0 |
| N/A 42C P0 45W / 250W | 13MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA RTX A6000 Off | 00000000:9D:00.0 Off | Off |
| 30% 37C P8 22W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA RTX A6000 Off | 00000000:9E:00.0 Off | Off |
| 30% 35C P8 27W / 300W | 14MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
NetworkUnavailable False Tue, 21 May 2024 00:17:41 +0800 Tue, 21 May 2024 00:17:41 +0800 FlannelIsUp Flannel is running on this node MemoryPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 21 May 2024 11:41:24 +0800 Thu, 09 May 2024 15:48:31 +0800 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 21 May 2024 11:41:24 +0800 Tue, 21 May 2024 00:17:52 +0800 KubeletReady kubelet is posting ready status. AppArmor enabled Addresses: InternalIP: 192.168.2.145 Hostname: 416a100 Capacity: cpu: 80 ephemeral-storage: 459819088Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 263739228Ki nvidia.com/gpu: 0 pods: 110 Allocatable: cpu: 80 ephemeral-storage: 447312008456 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 263739228Ki nvidia.com/gpu: 0 pods: 110 2.nvidia-smi nvidia-smi Tue May 21 11:44:30 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-PCIE-40GB Off | 00000000:36:00.0 Off | 0 | | N/A 42C P0 46W / 250W | 13MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA A100-PCIE-40GB Off | 00000000:37:00.0 Off | 0 | | N/A 42C P0 45W / 250W | 13MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA RTX A6000 Off | 00000000:9D:00.0 Off | Off | | 30% 37C P8 22W / 300W | 14MiB / 49140MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA RTX A6000 Off | 00000000:9E:00.0 Off | Off | | 30% 36C P8 28W / 300W | 14MiB / 49140MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB | | 2 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB | | 3 N/A N/A 2480 G /usr/lib/xorg/Xorg 4MiB |
3.sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi Tue May 21 03:44:52 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A100-PCIE-40GB Off | 00000000:36:00.0 Off | 0 | | N/A 42C P0 46W / 250W | 13MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA A100-PCIE-40GB Off | 00000000:37:00.0 Off | 0 | | N/A 42C P0 45W / 250W | 13MiB / 40960MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA RTX A6000 Off | 00000000:9D:00.0 Off | Off | | 30% 37C P8 22W / 300W | 14MiB / 49140MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA RTX A6000 Off | 00000000:9E:00.0 Off | Off | | 30% 35C P8 27W / 300W | 14MiB / 49140MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
4.hami pod: kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE helm-install-traefik-nzsh4 0/1 Completed 0 11d svclb-traefik-cpwxf 2/2 Running 40 11d metrics-server-7b4f8b595-5kn69 1/1 Running 21 11d local-path-provisioner-64d457c485-nccpm 1/1 Running 20 11d coredns-5d69dc75db-q7rxn 1/1 Running 20 11d traefik-5dd496474-rxmr2 1/1 Running 20 11d nvidia-device-plugin-daemonset-jg762 1/1 Running 0 5m50s hami-device-plugin-nv5gs 2/2 Running 0 4m43s hami-scheduler-757847d79f-n7dbf 2/2 Running 0 4m43s