GoogleCloudPlatform / container-engine-accelerators

Collection of tools and examples for managing Accelerated workloads in Kubernetes Engine
Apache License 2.0
214 stars 151 forks source link

Fix two bug related to metrics #404

Closed grac3gao closed 2 months ago

grac3gao commented 2 months ago

Fix two bug related to metrics

  1. MIG GPU doesn't support metrics, should skip
  2. When container has multiple GPUs, during one round of the metrics collection, only one GPU's metrics is collected. Should expose metrics for all GPUs.