NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
923 stars 159 forks source link

failed to transform metrics for transform 'podMapper' #378

Open jicki opened 3 months ago

jicki commented 3 months ago

What is the version?

3.3.7-3.5.0

What happened?

2024/08/21 09:32:30 http: superfluous response.WriteHeader call from github.com/NVIDIA/dcgm-exporter/pkg/dcgmexporter.(*MetricsServer).Metrics (server.go:124)
time="2024-08-21T09:32:30Z" level=error msg="Failed to write response." error="failed to transform metrics for transform 'podMapper'; err: failure getting pod resources; err: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5812454 vs. 4194304)"

What did you expect to happen?

none

What is the GPU model?

NVIDIA-SMI 535.129.03

Driver Version: 535.129.03

CUDA Version: 12.2

What is the environment?

No response

How did you deploy the dcgm-exporter and what is the configuration?

No response

How to reproduce the issue?

No response

Anything else we need to know?

No response