Can't understand why do I need nvidia-device-plugin

k0nstantinv commented 3 years ago

Hi! I've read all the k8s docs, I've read all the local docs about plugin itself, I do understand what the nvidia-container-runtime is, I've tried deploy this device plugin, device plguin from GCP, I have no questions how to deploy it etc. But...

I completely can't understand why do I need it? Maybe I didn't understand something. Let me show

I have 1.14 cluster, bare metal node with Tesla k40c and nvidia/cuda drivers installed

here is my nvidia-smi output

==============NVSMI LOG==============

Timestamp                           : Wed Sep 15 21:42:05 2021
Driver Version                      : 440.64.00
CUDA Version                        : 10.2

Attached GPUs                       : 1
GPU 00000000:84:00.0
    Product Name                    : Tesla K40c
    Product Brand                   : Tesla
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0321816019511
    GPU UUID                        : GPU-afa1c01a-3776-2166-5689-cc8ef444f42b
    Minor Number                    : 0
    VBIOS Version                   : 80.80.65.00.03
    MultiGPU Board                  : No
    Board ID                        : 0x8400
    GPU Part Number                 : 900-22081-0350-000
    Inforom Version
        Image Version               : 2081.0206.01.04
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x84
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x102410DE
        Bus Id                      : 00000000:84:00.0
        Sub System Id               : 0x0983103C
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : 25 %
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : N/A
            HW Power Brake Slowdown : N/A
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 11441 MiB
        Used                        : 2559 MiB
        Free                        : 8882 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 2 MiB
        Free                        : 254 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
        Aggregate
            Single Bit
                Device Memory       : 5
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 5
            Double Bit
                Device Memory       : 2
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 2
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 1
        Pending Page Blacklist      : No
    Temperature
        GPU Current Temp            : 50 C
        GPU Shutdown Temp           : 95 C
        GPU Slowdown Temp           : 90 C
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 68.75 W
        Power Limit                 : 235.00 W
        Default Power Limit         : 235.00 W
        Enforced Power Limit        : 235.00 W
        Min Power Limit             : 180.00 W
        Max Power Limit             : 235.00 W
    Clocks
        Graphics                    : 745 MHz
        SM                          : 745 MHz
        Memory                      : 3004 MHz
        Video                       : 540 MHz
    Applications Clocks
        Graphics                    : 745 MHz
        Memory                      : 3004 MHz
    Default Applications Clocks
        Graphics                    : 745 MHz
        Memory                      : 3004 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 3004 MHz
        Video                       : 540 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 16885
            Type                    : C
            Name                    : python3
            Used GPU Memory         : 1272 MiB
        Process ID                  : 16910
            Type                    : C
            Name                    : python3
            Used GPU Memory         : 1272 MiB

docker 19.03 along with nvidia-container-runtime are installed and configured

dpkg -l | grep docker
ii  docker-ce                       5:19.03.8~3-0~debian-stretch        amd64        Docker: the open-source application container engine
ii  docker-ce-cli                   5:19.03.15~3-0~debian-stretch       amd64        Docker CLI: the open-source application container engine

cat /etc/docker/daemon.json
{
  "live-restore": true,
  "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

nvidia-container-cli -V
version: 1.3.3
build date: 2021-02-05T13:30+00:00
build revision: bd9fc3f2b642345301cb2e23de07ec5386232317
build compiler: x86_64-linux-gnu-gcc-6 6.3.0 20170516
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

My setup works

Let me explain what I don't understand. I need gpu resource on the node that will be used by the scheduler, right? Ok, I can PATCH my node with extended resource as it described here

get memory count from nvidia-smi

gpu_memory="$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits 2>/dev/null)"

and push it right into the node status, something like:

CONTENT_TYPE="application/json-patch+json"
RESOURCE_NAME="example.ru~1gpu_memory"
CAPACITY_PATH="/status/capacity/${RESOURCE_NAME}"
gpu_memory="$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits 2>/dev/null)" #(11441)
data="[{\"op\": \"add\", \"path\": \"${CAPACITY_PATH}\", \"value\": \"${gpu_memory}\"}]"

  curl \
    --silent \
    "${API_URL}" \
    --key "${KEY}" \
    --cert "${CERT}" \
    --cacert "${CA}" \
    --header "Content-Type: ${CONTENT_TYPE}" \
    --request PATCH \
    --data "${data}"

that's it! my node has the resource

Capacity:
...
 example.ru/gpu_memory:       11441
...

Here is the pod yaml

$ cat gpu.pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: cuda-container
      image: nvcr.io/nvidia/cuda:9.0-devel
      resources:
        limits:
          example.ru/gpu_memory: "128"
    - name: digits-container
      image: nvcr.io/nvidia/digits:20.12-tensorflow-py3
      resources:
        limits:
          example.ru/gpu_memory: "128"

I'm going to deploy it all without device-plugin

k apply -f gpu.pod.yaml

I can see GPU has been detected although it says that model is not supported in that version of digits image

Containers:
  cuda-container:
    Container ID:   docker://76683ea186d0be74124dced77ad71207decf4b9534c7b41292377956716d6e9e
    Image:          nvcr.io/nvidia/cuda:9.0-devel
    Image ID:       docker-pullable://nvcr.io/nvidia/cuda@sha256:879e34e7059ed350140bb0b40f1b1c543846ce9a2088133494b0b3495d8c92c5
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed <------------------------------
      Exit Code:    0
      Started:      Wed, 15 Sep 2021 22:22:22 +0300
      Finished:     Wed, 15 Sep 2021 22:22:22 +0300
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 15 Sep 2021 22:22:10 +0300
      Finished:     Wed, 15 Sep 2021 22:22:10 +0300
    Ready:          False
    Restart Count:  1
    Limits:
      example.ru/gpu_memory:  128
    Requests:
      example.ru/gpu_memory:  128
    Environment:              <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-pv8hv (ro)

$ k logs -f gpu-pod digits-container

============
== DIGITS ==
============

NVIDIA Release 20.12 (build 17912121)
DIGITS Version 6.1.1

Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
DIGITS Copyright (c) 2014-2019, NVIDIA CORPORATION. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
ERROR: Detected NVIDIA Tesla K40c GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container

NOTE: Legacy NVIDIA Driver detected.  Compatibility mode ENABLED.

2021-09-15 19:22:18.710270: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
  ___ ___ ___ ___ _____ ___
 |   \_ _/ __|_ _|_   _/ __|
 | |) | | (_ || |  | | \__ \
 |___/___\___|___| |_| |___/ 6.1.1

Docs says

The NVIDIA device plugin for Kubernetes is a Daemonset that allows you to automatically:

Expose the number of GPUs on each nodes of your cluster

Keep track of the health of your GPUs

Run GPU enabled containers in your Kubernetes cluster.

So I can run any count of pods with gpu-enabled containers in k8s without device-plugin. Containers have all they need to work with GPU from docker/nvidia-container-runtime, don't they? How could device plugin help me? What advantages it could give? I appreciate any help, any advises, links to learn or explanations you can give. I just want to make it clear for myself

cdesiniotis commented 3 years ago

I would advise reading up on the device plugin framework, which should help you understand the motivation, use cases, advantages: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md

Containers have all they need to work with GPU from docker/nvidia-container-runtime, don't they?

Yes, you are correct. The nvidia-container-toolkit stack, which includes libnvidia-container, nvidia-container-runtime, etc., is all you need to run GPU workloads in containers. The NVIDIA device plugin is specific to Kubernetes.

I need gpu resource on the node that will be used by the scheduler

Yes, the device plugin makes the Kubernetes scheduler aware of gpu resources in your cluster. In your example, you manually did this. The major advantage of the device plugin is that it automates this process for all nodes and allows you to scale up/down your cluster seamlessly.

k0nstantinv commented 3 years ago

@cdesiniotis tanks a lot! A've read it lately. I don't understand how device plugin provides libs to pod container when they have already been provided by nvidia-container-toolkit stack? Where the vars like NVIDIA_VISIBLE_DEVICES etc. are declaring? How an app able to read and understand such a var? Do I need special base image for that? Is it for restricting some GPU abilities from an app? I surfed tens of closed and open issues, still can't understand, It is confusing me a lot

MaxTranced commented 11 months ago

I would advise reading up on the device plugin framework, which should help you understand the motivation, use cases, advantages: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md

Current design proposal link is https://github.com/kubernetes/design-proposals-archive/blob/acc25e14ca83dfda4f66d8cb1f1b491f26e78ffe/resource-management/device-plugin.md

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

NVIDIA / k8s-device-plugin

Can't understand why do I need nvidia-device-plugin #265