NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
1.88k stars 214 forks source link

Excessive runtime logging could cause Kubernetes workload deployment failure #511

Open weistonedawei opened 1 month ago

weistonedawei commented 1 month ago

Observed Kubernetes workload deployment failure caused by excessive logging in /run/containerd/io.containerd.runtime.v2.task/k8s.io//log.json file. This leads to /run tmpfs mount to be at 100% utilization, which prevents further container creation on the affected node.

When container spec uses exec livenessProbe, the following log entries will be logged:

{"level":"info","msg":"Running with config:\n{\n  \"DisableRequire\": false,\n  \"SwarmResource\": \"\",\n  \"AcceptEnvvarUnprivileged\": true,\n  \"AcceptDeviceListAsVolumeMounts\": false,\n  \"SupportedDriverCapabilities\": \"compat32,compute,display,graphics,ngx,utility,video\",\n  \"NVIDIAContainerCLIConfig\": {\n    \"Root\": \"/run/nvidia/driver\",\n    \"Path\": \"/usr/local/nvidia/toolkit/nvidia-container-cli\",\n    \"Environment\": [],\n    \"Debug\": \"\",\n    \"Ldcache\": \"\",\n    \"LoadKmods\": true,\n    \"NoPivot\": false,\n    \"NoCgroups\": false,\n    \"User\": \"\",\n    \"Ldconfig\": \"@/run/nvidia/driver/sbin/ldconfig.real\"\n  },\n  \"NVIDIACTKConfig\": {\n    \"Path\": \"/usr/local/nvidia/toolkit/nvidia-ctk\"\n  },\n  \"NVIDIAContainerRuntimeConfig\": {\n    \"DebugFilePath\": \"/dev/null\",\n    \"LogLevel\": \"info\",\n    \"Runtimes\": [\n      \"docker-runc\",\n      \"runc\"\n    ],\n    \"Mode\": \"cdi\",\n    \"Modes\": {\n      \"CSV\": {\n        \"MountSpecPath\": \"/etc/nvidia-container-runtime/host-files-for-container.d\"\n      },\n      \"CDI\": {\n        \"SpecDirs\": [\n          \"/etc/cdi\",\n          \"/var/run/cdi\"\n        ],\n        \"DefaultKind\": \"management.nvidia.com/gpu\",\n        \"AnnotationPrefixes\": [\n          \"nvidia.cdi.k8s.io/\"\n        ]\n      }\n    }\n  },\n  \"NVIDIAContainerRuntimeHookConfig\": {\n    \"Path\": \"/usr/local/nvidia/toolkit/nvidia-container-runtime-hook\",\n    \"SkipModeDetection\": true\n  }\n}","time":"2024-05-24T20:31:18+02:00"}
{"level":"info","msg":"Using low-level runtime /usr/sbin/runc","time":"2024-05-24T20:31:18+02:00"}

A sample container spec:

spec:
  containers:
  - image: kubeflow/ml-pipeline/visualization-server:2.0.0-alpha.7
    livenessProbe:
      exec:
        command:
        - wget
        - -q
        - -S
        - -O
        - '-'
        - http://localhost:8888/
    name: ml-pipeline-visualizationserver

The log entries come from runtime.go starting on line 75 and from runtime_low_level.go code.

IMHO, setting log level to DEBUG should be fine; it would allow easy debugging and not affecting functionalities.

Current workaround used is to set log-level = "error" in /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml.


accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"

[nvidia-container-cli]
  environment = []
  ldconfig = "@/run/nvidia/driver/sbin/ldconfig.real"
  load-kmods = true
  path = "/usr/local/nvidia/toolkit/nvidia-container-cli"
  root = "/run/nvidia/driver"

[nvidia-container-runtime]
  log-level = "info"
  mode = "cdi"
  runtimes = ["docker-runc", "runc"]

  [nvidia-container-runtime.modes]

    [nvidia-container-runtime.modes.cdi]
      annotation-prefixes = ["nvidia.cdi.k8s.io/"]
      default-kind = "management.nvidia.com/gpu"
      spec-dirs = ["/etc/cdi", "/var/run/cdi"]

    [nvidia-container-runtime.modes.csv]
      mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
  path = "/usr/local/nvidia/toolkit/nvidia-container-runtime-hook"
  skip-mode-detection = true

[nvidia-ctk]
  path = "/usr/local/nvidia/toolkit/nvidia-ctk"

I used gpu-operator in the Kubernetes cluster and here is the runtime version info:

cd /usr/local/nvidia/toolkit

./nvidia-container-runtime --version
NVIDIA Container Runtime version 1.14.3
commit: 53b24618a542025b108239fe602e66e912b7d6e2
spec: 1.1.0-rc.2

runc version 1.1.12
commit: v1.1.12-0-g51d5e946
spec: 1.0.2-dev
go: go1.20.13
libseccomp: 2.5.4

Attempted to create /etc/nvidia-container-runtime/config.toml to override log-level did not work.

elezar commented 1 month ago

@weistonedawei for comparison, do you have the behaviour for a similar setup that isn't using the nvidia-container-runtime?

Also, what are the contents of /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml?

weistonedawei commented 1 month ago

@elezar I updated my previous comment to include config.toml content, which is installed by gpu-operator. container engine is containerd.

Another installation using container engine Docker and installation from nvidia apt package repo nvidia-container-runtime on the worker node host also has excessive logging but without "Using config ..." entries. The version of nvidia-container-runtime is 1.11.0. This installation uses Docker container engine. Slower '/run' utilization but definitely filling up the '/run' tmpfs.

NVIDIA Container Runtime version 1.11.0
commit: d9de4a0
spec: 1.0.2-dev

On installations without nvidia-container-runtime, '/run' tpmfs mount utilization is below 1%. ContainerD is the engine.

Easily reproducible: 1) install gpu-operator in a k8s cluster 2) create a POD that uses exec livenessProbe 3) login to the node on with POD with exec livenessProbe is running and df -h /run

Thanks for checking it out.

elezar commented 1 month ago

@weistonedawei I think we can reduce the info logging in cases where we are not creating a container. I would propose

Note that this will only be available as an update to the 1.15.x version of the toolkit.