NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
866 stars 153 forks source link

Failed to add DCGM_EXP_CLOCK_EVENTS_COUNT #317

Open CodeBrek opened 5 months ago

CodeBrek commented 5 months ago

What is the version?

3.3.5

What happened?

GPU: A30 GPU Driver: 470.103.01 When I added DCGM_EXP_CLOCK_EVENTS_COUNT to collect the data from A30 MIG 6G, it failed and showed Failed to gather clock event" error="dcgmGetValuesSince_v2 failed with error code -16" . After adding DCGM_EXP_CLOCK_EVENTS_COUNT, it will start reporting this error and all indicators will disappear.

What did you expect to happen?

get metric named DCGM_EXP_CLOCK_EVENTS_COUNT without reporting "Failed to gather clock event"

What is the GPU model?

A30

What is the environment?

In a pod based on K8S

How did you deploy the dcgm-exporter and what is the configuration?

Deploy DCGM-Exporter with the image nvcr.io/nvidia/cloud-native/dcgm:3.3.5-1-ubi9

How to reproduce the issue?

No response

Anything else we need to know?

No response

nvvfedorov commented 5 months ago

Thanks for the bug report. Unfortunately, we can not reproduce the bug. Please do the following:

  1. Run DCGM-exporter in debug mode and provide logs to us.

Example:

sudo docker run -v /tmp/default-counters.csv:/etc/dcgm-exporter/default-counters.csv --net host --privileged --gpus all --cap-add SYS_ADMIN --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.1-ubi9 -f /etc/dcgm-exporter/default-counters.csv --debug --enable-dcgm-log --dcgm-log-level=DEBUG
  1. Try to update the nvidia-driver, I noticed, that you use 470.103.01 when the current version is 550.54.15.
CodeBrek commented 5 months ago

Hi nvvfedorov! Thanks for the suggestion! As we installed 470.103.01 for a long time in the production environment, it's a bit difficult for us to update the GPU driver.

We tried the first solution, and attached the log after enable the debug mode.

nvvfedorov commented 5 months ago

@CodeBrek. Thank you for the logs. I need the following information:

  1. Docker image name and version. The provided image name: nvcr.io/nvidia/cloud-native/dcgm:3.3.5-1-ubi9 is not a DCGM-exporter image.
  2. Did you deploy the DCGM-exporter in Docker, on K8S, or K8S with the help of the GPU operator?
  3. What is the DCGM-exporter configuration file, that contains enabled counters?

I appreciate any help you can provide.

CodeBrek commented 5 months ago

Really appreciate your feedback!

Here's the information I collect, please help review.

DCGM-exporter image version: nvidia/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04. DCGM version: nvidia/dcgm:3.3.5-1-ubuntu22.04. GPU Server: A30 * 4. GPU Driver version: 470.103.01 (For cluster stability reasons, cannot upgrade temporarily). MIG-mode: MIG-6G, 16 instances in a A30 server with 4 physical A30 GPUs.

DCGM-exporter configuration:

apiVersion: apps.kruise.io/v1alpha1
kind: DaemonSet
metadata:
  name: "dcgm-exporter"
  namespace: "kube-system"
  labels:
    app.kubernetes.io/name: "dcgm-exporter"
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 15%
  selector:
    matchLabels:
      app.kubernetes.io/name: "dcgm-exporter"
  template:
    metadata:
      labels:
        app.kubernetes.io/name: "dcgm-exporter"
      name: "dcgm-exporter"
    spec:
      tolerations:
        - operator: Exists
      containers:
        - image: nvidia/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04
          imagePullPolicy: Always
          ports:
            - name: http
              containerPort: 9400
              protocol: TCP
          env:
            - name: "DCGM_EXPORTER_LISTEN"
              value: ":9400"
            - name: "DCGM_EXPORTER_KUBERNETES"
              value: "true"
            - name: "DCGM_EXPORTER_CONFIGMAP_DATA"
              value: 'kube-system:dcgm-metrics'
            - name: GOMAXPROCS
              value: "2"
          name: "dcgm-exporter"
          args: ["-f", "/tmp/all-metrics.csv", "-r", "localhost:5555"]
          securityContext:
            privileged: true
            runAsNonRoot: false
            runAsUser: 0
          resources:
            requests:
              cpu: 0
              memory: 0
            limits:
              cpu: 200m
              memory: 200Mi
          volumeMounts:
            - name: "pod-gpu-resources"
              readOnly: true
              mountPath: "/var/lib/kubelet/pod-resources"
        - image: nvidia/dcgm:3.3.5-1-ubuntu22.04
          imagePullPolicy: Always
          lifecycle:
            type: Sidecar
          name: "nv-hostengine"
          securityContext:
            privileged: true
            runAsNonRoot: false
            runAsUser: 0
          resources:
            requests:
              cpu: 0
              memory: 0
            limits:
              cpu: 700m
              memory: 500Mi
      serviceAccountName: dcgm-exporter
      securityContext: {}
      priorityClassName: "system-node-critical"
      hostNetwork: true
      volumes:
        - name: "pod-gpu-resources"
          hostPath:
            path: "/opt/mt/kubelet/workdir/pod-resources"

---

apiVersion: v1
data:
  metrics: |-
    DCGM_FI_DEV_SM_CLOCK,  gauge, SM clock frequency (in MHz).
    DCGM_FI_DEV_MEM_CLOCK, gauge, Memory clock frequency (in MHz).
    DCGM_FI_DEV_MEMORY_TEMP, gauge, Memory temperature (in C).
    DCGM_FI_DEV_GPU_TEMP,    gauge, GPU temperature (in C).
    DCGM_FI_DEV_POWER_USAGE,              gauge, Power draw (in W).
    DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION, counter, Total energy consumption since boot (in mJ).
    DCGM_FI_DEV_PCIE_TX_THROUGHPUT,  counter, Total number of bytes transmitted through PCIe TX (in KB) via NVML.
    DCGM_FI_DEV_PCIE_RX_THROUGHPUT,  counter, Total number of bytes received through PCIe RX (in KB) via NVML.
    DCGM_FI_DEV_PCIE_REPLAY_COUNTER, counter, Total number of PCIe retries.
    DCGM_FI_DEV_GPU_UTIL,      gauge, GPU utilization (in %).
    DCGM_FI_DEV_MEM_COPY_UTIL, gauge, Memory utilization (in %).
    DCGM_FI_DEV_ENC_UTIL,      gauge, Encoder utilization (in %).
    DCGM_FI_DEV_DEC_UTIL ,     gauge, Decoder utilization (in %).
    DCGM_FI_DEV_XID_ERRORS,            gauge,   Value of the last XID error encountered.
    DCGM_FI_DEV_FB_FREE, gauge, Framebuffer memory free (in MiB).
    DCGM_FI_DEV_FB_USED, gauge, Framebuffer memory used (in MiB).
    DCGM_FI_DEV_VGPU_LICENSE_STATUS, gauge, vGPU License status
    DCGM_FI_DEV_UNCORRECTABLE_REMAPPED_ROWS, counter, Number of remapped rows for uncorrectable errors
    DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS,   counter, Number of remapped rows for correctable errors
    DCGM_FI_DEV_ROW_REMAP_FAILURE,           gauge,   Whether remapping of rows has failed
    DCGM_FI_PROF_GR_ENGINE_ACTIVE,   gauge, Ratio of time the graphics engine is active (in %).
    DCGM_FI_PROF_SM_ACTIVE,          gauge, The ratio of cycles an SM has at least 1 warp assigned (in %).
    DCGM_FI_PROF_SM_OCCUPANCY,       gauge, The ratio of number of warps resident on an SM (in %).
    DCGM_FI_PROF_PIPE_TENSOR_ACTIVE, gauge, Ratio of cycles the tensor (HMMA) pipe is active (in %).
    DCGM_FI_PROF_DRAM_ACTIVE,        gauge, Ratio of cycles the device memory interface is active sending or receiving data (in %).
    DCGM_FI_PROF_PCIE_TX_BYTES,      counter, The number of bytes of active pcie tx data including both header and payload.
    DCGM_FI_PROF_PCIE_RX_BYTES,      counter, The number of bytes of active pcie rx data including both header and payload.
    DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL,            counter, Total number of NVLink bandwidth counters for all lanes
    DCGM_FI_DEV_NVLINK_BANDWIDTH_L0,               counter, The number of bytes of active NVLink rx or tx data including both header and payload.
    DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_TOTAL, counter, Total number of NVLink flow-control CRC errors.
    DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_TOTAL, counter, Total number of NVLink data CRC errors.
    DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_TOTAL,   counter, Total number of NVLink retries.
    DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_TOTAL, counter, Total number of NVLink recovery errors.
    DCGM_FI_PROF_PIPE_FP64_ACTIVE,   gauge, Ratio of cycles the fp64 pipes are active (in %).
    DCGM_FI_PROF_PIPE_FP32_ACTIVE,   gauge, Ratio of cycles the fp32 pipes are active (in %).
    DCGM_FI_PROF_PIPE_FP16_ACTIVE,   gauge, Ratio of cycles the fp16 pipes are active (in %).
    DCGM_EXP_CLOCK_EVENTS_COUNT,    counter, Count of clock events within the user-specified time window (see clock-events-count-window-size param).
kind: ConfigMap
metadata:
  name: dcgm-metrics
  namespace: kube-system

Logs just after restarting the DCGM-exporter:

2024/04/29 12:09:24 maxprocs: Honoring GOMAXPROCS="2" as set in environment
time="2024-04-29T12:09:24Z" level=info msg="Starting dcgm-exporter"
time="2024-04-29T12:09:24Z" level=info msg="Attemping to connect to remote hostengine at localhost:5555"
time="2024-04-29T12:09:25Z" level=info msg="DCGM successfully initialized!"
time="2024-04-29T12:09:25Z" level=info msg="Collecting DCP Metrics"
time="2024-04-29T12:09:25Z" level=info msg="Initializing system entities of type: GPU"
time="2024-04-29T12:09:26Z" level=info msg="Not collecting NvSwitch metrics; no fields to watch for device type: 3"
time="2024-04-29T12:09:26Z" level=info msg="Not collecting NvLink metrics; no fields to watch for device type: 6"
time="2024-04-29T12:09:26Z" level=info msg="Not collecting CPU metrics; no fields to watch for device type: 7"
time="2024-04-29T12:09:26Z" level=info msg="Not collecting CPU Core metrics; no fields to watch for device type: 8"
time="2024-04-29T12:09:26Z" level=info msg="Kubernetes metrics collection enabled!"
time="2024-04-29T12:09:26Z" level=info msg="Kubernetes metrics collection enabled!"
time="2024-04-29T12:09:26Z" level=info msg="DCGM_EXP_CLOCK_EVENTS_COUNT collector initialized"
time="2024-04-29T12:09:26Z" level=info msg="Pipeline starting"
time="2024-04-29T12:09:26Z" level=info msg="Starting webserver"
time="2024-04-29T12:09:26Z" level=info msg="Listening on" address="[::]:9400"
time="2024-04-29T12:09:26Z" level=info msg="TLS is disabled." address="[::]:9400" http2=false
time="2024-04-29T12:10:21Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:21 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-04-29T12:10:22Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:22 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-04-29T12:10:22Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:22 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-04-29T12:10:23Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:23 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-04-29T12:10:23Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:23 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-04-29T12:10:24Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:24 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-04-29T12:10:24Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:24 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-04-29T12:10:24Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:24 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-04-29T12:10:25Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:25 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-04-29T12:10:25Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:25 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-04-29T12:10:26Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/04/29 12:10:26 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)`

Promethues accesses the /metrics API every 30 seconds, but unable to connect. Using the curl cmd, an error message is as follow:
`# HELP DCGM_FI_DEV_SM_CLOCK SM clock frequency (in MHz).
# TYPE DCGM_FI_DEV_SM_CLOCK gauge
DCGM_FI_DEV_SM_CLOCK{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 1440
DCGM_FI_DEV_SM_CLOCK{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 1440
# HELP DCGM_FI_DEV_MEM_CLOCK Memory clock frequency (in MHz).
# TYPE DCGM_FI_DEV_MEM_CLOCK gauge
DCGM_FI_DEV_MEM_CLOCK{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 1215
DCGM_FI_DEV_MEM_CLOCK{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 1215
# HELP DCGM_FI_DEV_MEMORY_TEMP Memory temperature (in C).
# TYPE DCGM_FI_DEV_MEMORY_TEMP gauge
DCGM_FI_DEV_MEMORY_TEMP{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_MEMORY_TEMP{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_MEMORY_TEMP{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_MEMORY_TEMP{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_MEMORY_TEMP{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 34
DCGM_FI_DEV_MEMORY_TEMP{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 34
DCGM_FI_DEV_MEMORY_TEMP{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 34
DCGM_FI_DEV_MEMORY_TEMP{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 34
DCGM_FI_DEV_MEMORY_TEMP{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 48
DCGM_FI_DEV_MEMORY_TEMP{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 48
DCGM_FI_DEV_MEMORY_TEMP{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 48
DCGM_FI_DEV_MEMORY_TEMP{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 48
DCGM_FI_DEV_MEMORY_TEMP{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 32
DCGM_FI_DEV_MEMORY_TEMP{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 32
DCGM_FI_DEV_MEMORY_TEMP{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 32
DCGM_FI_DEV_MEMORY_TEMP{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 32
# HELP DCGM_FI_DEV_GPU_TEMP GPU temperature (in C).
# TYPE DCGM_FI_DEV_GPU_TEMP gauge
DCGM_FI_DEV_GPU_TEMP{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 35
DCGM_FI_DEV_GPU_TEMP{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 34
DCGM_FI_DEV_GPU_TEMP{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 34
DCGM_FI_DEV_GPU_TEMP{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 34
DCGM_FI_DEV_GPU_TEMP{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 34
# HELP DCGM_FI_DEV_POWER_USAGE Power draw (in W).
# TYPE DCGM_FI_DEV_POWER_USAGE gauge
DCGM_FI_DEV_POWER_USAGE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 52.636000
DCGM_FI_DEV_POWER_USAGE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 52.636000
DCGM_FI_DEV_POWER_USAGE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 52.746000
DCGM_FI_DEV_POWER_USAGE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 52.746000
DCGM_FI_DEV_POWER_USAGE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 56.121000
DCGM_FI_DEV_POWER_USAGE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 56.127000
DCGM_FI_DEV_POWER_USAGE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 56.127000
DCGM_FI_DEV_POWER_USAGE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 56.127000
DCGM_FI_DEV_POWER_USAGE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 57.643000
DCGM_FI_DEV_POWER_USAGE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 57.561000
DCGM_FI_DEV_POWER_USAGE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 57.561000
DCGM_FI_DEV_POWER_USAGE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 57.643000
DCGM_FI_DEV_POWER_USAGE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 56.575000
DCGM_FI_DEV_POWER_USAGE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 56.381000
DCGM_FI_DEV_POWER_USAGE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 56.381000
DCGM_FI_DEV_POWER_USAGE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 56.478000
# HELP DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION Total energy consumption since boot (in mJ).
# TYPE DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION counter
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 126272101025
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 126272101025
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 126272106318
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 126272101025
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 134473486744
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 134473486744
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 134473486744
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 134473481087
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 138249198597
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 138249198597
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 138249198597
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 138249198597
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 135188274123
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 135188279787
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 135188274123
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 135188274123
# HELP DCGM_FI_DEV_PCIE_REPLAY_COUNTER Total number of PCIe retries.
# TYPE DCGM_FI_DEV_PCIE_REPLAY_COUNTER counter
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
# HELP DCGM_FI_DEV_XID_ERRORS Value of the last XID error encountered.
# TYPE DCGM_FI_DEV_XID_ERRORS gauge
DCGM_FI_DEV_XID_ERRORS{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_XID_ERRORS{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
# HELP DCGM_FI_DEV_FB_FREE Framebuffer memory free (in MiB).
# TYPE DCGM_FI_DEV_FB_FREE gauge
DCGM_FI_DEV_FB_FREE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 6012
DCGM_FI_DEV_FB_FREE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 6012
# HELP DCGM_FI_DEV_FB_USED Framebuffer memory used (in MiB).
# TYPE DCGM_FI_DEV_FB_USED gauge
DCGM_FI_DEV_FB_USED{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 3
DCGM_FI_DEV_FB_USED{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 3
# HELP DCGM_FI_DEV_NVLINK_BANDWIDTH_L0 The number of bytes of active NVLink rx or tx data including both header and payload.
# TYPE DCGM_FI_DEV_NVLINK_BANDWIDTH_L0 counter
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_L0{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
# HELP DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL Total number of NVLink bandwidth counters for all lanes
# TYPE DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL counter
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
# HELP DCGM_FI_DEV_VGPU_LICENSE_STATUS vGPU License status
# TYPE DCGM_FI_DEV_VGPU_LICENSE_STATUS gauge
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0
# HELP DCGM_FI_PROF_GR_ENGINE_ACTIVE Ratio of time the graphics engine is active (in %).
# TYPE DCGM_FI_PROF_GR_ENGINE_ACTIVE gauge
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
# HELP DCGM_FI_PROF_SM_ACTIVE The ratio of cycles an SM has at least 1 warp assigned (in %).
# TYPE DCGM_FI_PROF_SM_ACTIVE gauge
DCGM_FI_PROF_SM_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
# HELP DCGM_FI_PROF_SM_OCCUPANCY The ratio of number of warps resident on an SM (in %).
# TYPE DCGM_FI_PROF_SM_OCCUPANCY gauge
DCGM_FI_PROF_SM_OCCUPANCY{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_SM_OCCUPANCY{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
# HELP DCGM_FI_PROF_PIPE_TENSOR_ACTIVE Ratio of cycles the tensor (HMMA) pipe is active (in %).
# TYPE DCGM_FI_PROF_PIPE_TENSOR_ACTIVE gauge
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
# HELP DCGM_FI_PROF_DRAM_ACTIVE Ratio of cycles the device memory interface is active sending or receiving data (in %).
# TYPE DCGM_FI_PROF_DRAM_ACTIVE gauge
DCGM_FI_PROF_DRAM_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_DRAM_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
# HELP DCGM_FI_PROF_PIPE_FP64_ACTIVE Ratio of cycles the fp64 pipes are active (in %).
# TYPE DCGM_FI_PROF_PIPE_FP64_ACTIVE gauge
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP64_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
# HELP DCGM_FI_PROF_PIPE_FP32_ACTIVE Ratio of cycles the fp32 pipes are active (in %).
# TYPE DCGM_FI_PROF_PIPE_FP32_ACTIVE gauge
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP32_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
# HELP DCGM_FI_PROF_PIPE_FP16_ACTIVE Ratio of cycles the fp16 pipes are active (in %).
# TYPE DCGM_FI_PROF_PIPE_FP16_ACTIVE gauge
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="0",UUID="GPU-265c9814-6156-355b-e589-ff2064894f16",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="1",UUID="GPU-2d20791a-b4a6-ba59-97f1-e95b99fe5b49",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="2",UUID="GPU-3f61f0fb-57f5-fd80-ebb6-54347a8178dd",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="a30-node-test-test0003"} 0.000000
DCGM_FI_PROF_PIPE_FP16_ACTIVE{gpu="3",UUID="GPU-67f36884-ee7a-2684-63ca-2c3a309e2cb5",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="a30-node-test-test0003"} 0.000000
failed to write response

Errors from DCGM-exporter container:

level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
nvvfedorov commented 4 months ago

@CodeBrek , Unfortunately at the moment, I can not reproduce the issue and I need your help.

I don't see where you mount "/tmp/all-metrics.csv" in the provided daemonset.

Please do the following things:

  1. Create a configuration file with the following content:
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS,gauge,
DCGM_EXP_CLOCK_EVENTS_COUNT,counter, Count of clock events within the user-specified time window (see clock-events-count-window-size param).
  1. Try to run the dcgm-exporter, with only those two metrics.

If you see errors, please make sure that you passed the valid configuration file to the dcgm-exporter.

CodeBrek commented 3 months ago

`The complete YAML required for reproduction is as follows:

apiVersion: apps.kruise.io/v1alpha1
kind: DaemonSet
metadata:
  name: "dcgm-exporter"
  namespace: "kube-system"
  labels:
    app.kubernetes.io/name: "dcgm-exporter"
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 15%
  selector:
    matchLabels:
      app.kubernetes.io/name: "dcgm-exporter"
  template:
    metadata:
      labels:
        app.kubernetes.io/name: "dcgm-exporter"
      name: "dcgm-exporter"
    spec:
      tolerations:
        - operator: Exists
      containers:
        - image: nvidia/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04
          imagePullPolicy: Always
          ports:
            - name: http
              containerPort: 9400
              protocol: TCP
          env:
            - name: "DCGM_EXPORTER_LISTEN"
              value: ":9400"
            - name: "DCGM_EXPORTER_KUBERNETES"
              value: "true"
            - name: "DCGM_EXPORTER_CONFIGMAP_DATA"
              value: 'kube-system:dcgm-metrics'
            - name: GOMAXPROCS
              value: "2"
          name: "dcgm-exporter"
          args: ["-r", "localhost:5555"]
          securityContext:
            privileged: true
            runAsNonRoot: false
            runAsUser: 0
          resources:
            requests:
              cpu: 0
              memory: 0
            limits:
              cpu: 200m
              memory: 200Mi
          volumeMounts:
            - name: "pod-gpu-resources"
              readOnly: true
              mountPath: "/var/lib/kubelet/pod-resources"
        - image: nvidia/dcgm:3.3.5-1-ubuntu22.04
          imagePullPolicy: Always
          lifecycle:
            type: Sidecar
          name: "nv-hostengine"
          securityContext:
            privileged: true
            runAsNonRoot: false
            runAsUser: 0
          resources:
            requests:
              cpu: 0
              memory: 0
            limits:
              cpu: 700m
              memory: 500Mi
      serviceAccountName: dcgm-exporter
      securityContext: {}
      priorityClassName: "system-node-critical"
      hostNetwork: true
      volumes:
        - name: "pod-gpu-resources"
          hostPath:
            path: "/opt/mt/kubelet/workdir/pod-resources"

---

apiVersion: v1
data:
  metrics: |-
    DCGM_FI_DEV_CLOCK_THROTTLE_REASONS,gauge,
    DCGM_EXP_CLOCK_EVENTS_COUNT,counter, Count of clock events within the user-specified time window (see clock-events-count-window-size param).
kind: ConfigMap
metadata:
  name: dcgm-metrics
  namespace: kube-system

Executing the following command on the machine where the dcgm-exporter pod reports an error.:

$ curl 127.0.0.1:9400/metrics
failed to write response
$ curl 127.0.0.1:9400/metrics
failed to write response
$ curl http://127.0.0.1:9400/metrics
failed to write response

"Output logs of dcgm-exporter from the time of its startup.":

2024/06/07 07:34:21 maxprocs: Honoring GOMAXPROCS="2" as set in environment
time="2024-06-07T07:34:21Z" level=info msg="Starting dcgm-exporter"
time="2024-06-07T07:34:21Z" level=info msg="Attemping to connect to remote hostengine at localhost:5555"
time="2024-06-07T07:34:22Z" level=info msg="DCGM successfully initialized!"
time="2024-06-07T07:34:22Z" level=info msg="Collecting DCP Metrics"
time="2024-06-07T07:34:22Z" level=info msg="Initializing system entities of type: GPU"
time="2024-06-07T07:34:23Z" level=info msg="Not collecting NvSwitch metrics; no fields to watch for device type: 3"
time="2024-06-07T07:34:23Z" level=info msg="Not collecting NvLink metrics; no fields to watch for device type: 6"
time="2024-06-07T07:34:23Z" level=info msg="Not collecting CPU metrics; no fields to watch for device type: 7"
time="2024-06-07T07:34:23Z" level=info msg="Not collecting CPU Core metrics; no fields to watch for device type: 8"
time="2024-06-07T07:34:23Z" level=info msg="Kubernetes metrics collection enabled!"
time="2024-06-07T07:34:23Z" level=info msg="Kubernetes metrics collection enabled!"
time="2024-06-07T07:34:23Z" level=info msg="DCGM_EXP_CLOCK_EVENTS_COUNT collector initialized"
time="2024-06-07T07:34:23Z" level=info msg="Pipeline starting"
time="2024-06-07T07:34:23Z" level=info msg="Starting webserver"
time="2024-06-07T07:34:23Z" level=info msg="Listening on" address="[::]:9400"
time="2024-06-07T07:34:23Z" level=info msg="TLS is disabled." address="[::]:9400" http2=false
time="2024-06-07T07:34:31Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:34:31 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-06-07T07:34:36Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:34:36 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-06-07T07:34:42Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:34:42 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-06-07T07:34:45Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:34:45 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-06-07T07:34:49Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:34:49 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-06-07T07:34:50Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:34:50 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-06-07T07:35:01Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:35:01 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-06-07T07:35:06Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:35:06 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-06-07T07:35:19Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:35:19 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-06-07T07:35:31Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:35:31 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-06-07T07:35:36Z" level=error msg="Failed to write response." error="dcgmGetValuesSince_v2 failed with error code -16"
2024/06/07 07:35:36 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)

"Output logs of nv-hostengine:

Started host engine version 3.3.5 using port number: 5555
CodeBrek commented 3 months ago

After we updated the Driver version to 535.129.03, we could get data of 'DCGM_FI_DEV_CLOCK_REASONS', but still fail to get DCGM_EXP_CLOCK_EVENTS_COUNT

MegaEle commented 3 months ago

Could you please provide the specific steps to reproduce the issue? We are able to consistently reproduce it using this configuration in our local Kubernetes cluster. Below is the output under the 535.129.03 driver.

# curl http://127.0.0.1:9400/metrics
# HELP DCGM_FI_DEV_CLOCK_THROTTLE_REASONS
# TYPE DCGM_FI_DEV_CLOCK_THROTTLE_REASONS gauge
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="0",UUID="GPU-886520a3-f4c4-4d1c-b5cc-655ca8b02758",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="0",UUID="GPU-886520a3-f4c4-4d1c-b5cc-655ca8b02758",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="0",UUID="GPU-886520a3-f4c4-4d1c-b5cc-655ca8b02758",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="0",UUID="GPU-886520a3-f4c4-4d1c-b5cc-655ca8b02758",device="nvidia0",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="1",UUID="GPU-30a7f778-4959-7159-a81d-ae3736eae509",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="1",UUID="GPU-30a7f778-4959-7159-a81d-ae3736eae509",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="1",UUID="GPU-30a7f778-4959-7159-a81d-ae3736eae509",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="1",UUID="GPU-30a7f778-4959-7159-a81d-ae3736eae509",device="nvidia1",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="2",UUID="GPU-910b9a75-35b6-8c88-0e34-81e2dfeb56a1",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="2",UUID="GPU-910b9a75-35b6-8c88-0e34-81e2dfeb56a1",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="2",UUID="GPU-910b9a75-35b6-8c88-0e34-81e2dfeb56a1",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="2",UUID="GPU-910b9a75-35b6-8c88-0e34-81e2dfeb56a1",device="nvidia2",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="3",UUID="GPU-6245016e-eba6-69c0-5ee7-9c17de988d31",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="3",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="3",UUID="GPU-6245016e-eba6-69c0-5ee7-9c17de988d31",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="4",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="3",UUID="GPU-6245016e-eba6-69c0-5ee7-9c17de988d31",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="5",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
DCGM_FI_DEV_CLOCK_THROTTLE_REASONS{gpu="3",UUID="GPU-6245016e-eba6-69c0-5ee7-9c17de988d31",device="nvidia3",modelName="NVIDIA A30",GPU_I_PROFILE="1g.6gb",GPU_I_ID="6",Hostname="hldy-data-k8s-gpu-a30-node0110.mt"} 1
failed to write response

And below is the output under the 470.82.01 driver.

$ curl http://127.0.0.1:9400/metrics
failed to write response

THX