Azure / Moneo

Distributed AI/HPC Monitoring Framework
MIT License
25 stars 16 forks source link

Moneo refresh #74

Closed rafsalas19 closed 8 months ago

rafsalas19 commented 8 months ago
RyoYang commented 8 months ago
  1. Could you also update the ETH device argument and sample rate in the container configuration here?https://github.com/Azure/Moneo/blob/70c0a2d75355d82909784c886ed0fc169a49a033/dockerfile/moneo-exporter-nvidia_entrypoint.sh#L17C1-L17C37?
  2. In the current configure_service.sh, if we use the managed Prometheus method, which means the argument should be empty, it still installs some unnecessary packages related to Azure Monitor. You can check here https://github.com/Azure/Moneo/blob/70c0a2d75355d82909784c886ed0fc169a49a033/linux_service/configure_service.sh#L35C1-L36C60.
rafsalas19 commented 8 months ago
  1. Could you also update the ETH device argument and sample rate in the container configuration here?https://github.com/Azure/Moneo/blob/70c0a2d75355d82909784c886ed0fc169a49a033/dockerfile/moneo-exporter-nvidia_entrypoint.sh#L17C1-L17C37?
  2. In the current configure_service.sh, if we use the managed Prometheus method, which means the argument should be empty, it still installs some unnecessary packages related to Azure Monitor. You can check here https://github.com/Azure/Moneo/blob/70c0a2d75355d82909784c886ed0fc169a49a033/linux_service/configure_service.sh#L35C1-L36C60.

Addressed

afragop72 commented 7 months ago

Hello

I am new on using Moneo for monitoring our Azure HPC cluster. We are using Headless Managed Grafana to visualize metrics from GPUs.

How do we enable GPU profiling metrics collection?

Thanks