Excessive runtime logging could cause Kubernetes workload deployment failure

weistonedawei commented 1 month ago

Observed Kubernetes workload deployment failure caused by excessive logging in /run/containerd/io.containerd.runtime.v2.task/k8s.io//log.json file. This leads to /run tmpfs mount to be at 100% utilization, which prevents further container creation on the affected node.

When container spec uses exec livenessProbe, the following log entries will be logged:

{"level":"info","msg":"Running with config:\n{\n  \"DisableRequire\": false,\n  \"SwarmResource\": \"\",\n  \"AcceptEnvvarUnprivileged\": true,\n  \"AcceptDeviceListAsVolumeMounts\": false,\n  \"SupportedDriverCapabilities\": \"compat32,compute,display,graphics,ngx,utility,video\",\n  \"NVIDIAContainerCLIConfig\": {\n    \"Root\": \"/run/nvidia/driver\",\n    \"Path\": \"/usr/local/nvidia/toolkit/nvidia-container-cli\",\n    \"Environment\": [],\n    \"Debug\": \"\",\n    \"Ldcache\": \"\",\n    \"LoadKmods\": true,\n    \"NoPivot\": false,\n    \"NoCgroups\": false,\n    \"User\": \"\",\n    \"Ldconfig\": \"@/run/nvidia/driver/sbin/ldconfig.real\"\n  },\n  \"NVIDIACTKConfig\": {\n    \"Path\": \"/usr/local/nvidia/toolkit/nvidia-ctk\"\n  },\n  \"NVIDIAContainerRuntimeConfig\": {\n    \"DebugFilePath\": \"/dev/null\",\n    \"LogLevel\": \"info\",\n    \"Runtimes\": [\n      \"docker-runc\",\n      \"runc\"\n    ],\n    \"Mode\": \"cdi\",\n    \"Modes\": {\n      \"CSV\": {\n        \"MountSpecPath\": \"/etc/nvidia-container-runtime/host-files-for-container.d\"\n      },\n      \"CDI\": {\n        \"SpecDirs\": [\n          \"/etc/cdi\",\n          \"/var/run/cdi\"\n        ],\n        \"DefaultKind\": \"management.nvidia.com/gpu\",\n        \"AnnotationPrefixes\": [\n          \"nvidia.cdi.k8s.io/\"\n        ]\n      }\n    }\n  },\n  \"NVIDIAContainerRuntimeHookConfig\": {\n    \"Path\": \"/usr/local/nvidia/toolkit/nvidia-container-runtime-hook\",\n    \"SkipModeDetection\": true\n  }\n}","time":"2024-05-24T20:31:18+02:00"}
{"level":"info","msg":"Using low-level runtime /usr/sbin/runc","time":"2024-05-24T20:31:18+02:00"}

A sample container spec:

spec:
  containers:
  - image: kubeflow/ml-pipeline/visualization-server:2.0.0-alpha.7
    livenessProbe:
      exec:
        command:
        - wget
        - -q
        - -S
        - -O
        - '-'
        - http://localhost:8888/
    name: ml-pipeline-visualizationserver

The log entries come from runtime.go starting on line 75 and from runtime_low_level.go code.

IMHO, setting log level to DEBUG should be fine; it would allow easy debugging and not affecting functionalities.

Current workaround used is to set log-level = "error" in /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml.


accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"

[nvidia-container-cli]
  environment = []
  ldconfig = "@/run/nvidia/driver/sbin/ldconfig.real"
  load-kmods = true
  path = "/usr/local/nvidia/toolkit/nvidia-container-cli"
  root = "/run/nvidia/driver"

[nvidia-container-runtime]
  log-level = "info"
  mode = "cdi"
  runtimes = ["docker-runc", "runc"]

  [nvidia-container-runtime.modes]

    [nvidia-container-runtime.modes.cdi]
      annotation-prefixes = ["nvidia.cdi.k8s.io/"]
      default-kind = "management.nvidia.com/gpu"
      spec-dirs = ["/etc/cdi", "/var/run/cdi"]

    [nvidia-container-runtime.modes.csv]
      mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
  path = "/usr/local/nvidia/toolkit/nvidia-container-runtime-hook"
  skip-mode-detection = true

[nvidia-ctk]
  path = "/usr/local/nvidia/toolkit/nvidia-ctk"

I used gpu-operator in the Kubernetes cluster and here is the runtime version info:

cd /usr/local/nvidia/toolkit

./nvidia-container-runtime --version
NVIDIA Container Runtime version 1.14.3
commit: 53b24618a542025b108239fe602e66e912b7d6e2
spec: 1.1.0-rc.2

runc version 1.1.12
commit: v1.1.12-0-g51d5e946
spec: 1.0.2-dev
go: go1.20.13
libseccomp: 2.5.4

Attempted to create /etc/nvidia-container-runtime/config.toml to override log-level did not work.

elezar commented 1 month ago

@weistonedawei for comparison, do you have the behaviour for a similar setup that isn't using the nvidia-container-runtime?

Also, what are the contents of /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml?

weistonedawei commented 1 month ago

@elezar I updated my previous comment to include config.toml content, which is installed by gpu-operator. container engine is containerd.

Another installation using container engine Docker and installation from nvidia apt package repo nvidia-container-runtime on the worker node host also has excessive logging but without "Using config ..." entries. The version of nvidia-container-runtime is 1.11.0. This installation uses Docker container engine. Slower '/run' utilization but definitely filling up the '/run' tmpfs.

NVIDIA Container Runtime version 1.11.0
commit: d9de4a0
spec: 1.0.2-dev

On installations without nvidia-container-runtime, '/run' tpmfs mount utilization is below 1%. ContainerD is the engine.

Easily reproducible: 1) install gpu-operator in a k8s cluster 2) create a POD that uses exec livenessProbe 3) login to the node on with POD with exec livenessProbe is running and df -h /run

Thanks for checking it out.

elezar commented 1 month ago

@weistonedawei I think we can reduce the info logging in cases where we are not creating a container. I would propose

changing the level for the applied config to Debug or Trace
moving the output of the low-level runtime to only be output at Debug if a create command is issued and logging it at Trace if this is not the case.

Note that this will only be available as an update to the 1.15.x version of the toolkit.

NVIDIA / nvidia-container-toolkit

Excessive runtime logging could cause Kubernetes workload deployment failure #511