Open rithinskaria opened 10 months ago
Significant caveats I'm just a passer by, not related to MS or this repo, and I'm new to telegraf. You may already know what I'm saying but either way it was a good learning for me :)
So when you say "can see 10 vms only", its not clear what they are
Metrics are collected in two ways:
For NSX:
main.py collects the VM info here, You can put debug in here to show the nodes being identified and/or not, and report back
For vsphere:
vsphere monitoring is configured here. and the plugin reference is here.
I do note that the syntax is different in the azure file compared to the readme ([""] vs ["/host/**"]. You could specify the config per the plugin read me and see if it makes a difference
Hope that helps
I'm a little unclear on what you are trying to do. The CSV is only used for NSX metrics. All of the vsphere metrics are read directly by telegraf via the vsphere plugin and then forwarded to azure monitor via the output plugin. The NSX metrics are written to CSV and the read by the input plugin and sent to azure monitor via the output plugin. None of the NSX metrics include any VM metrics. Only Edge metrics.
@khensler But I do see some VM names in the CSV besides the HCX and NSX VM names. So I was under the impression that, all the VM metrics including the VM names will be written to the CSV.
Anyways, I still can't find all VMs in Azure Monitor.
@adeturner Thanks, I am also new to Telegraf and exploring options to retrieve metrics to Azure Managed Grafana. This is running in an Azure VM which has access to the AVS environment and traffic is allowed via firewall.
I will post the architecture and config.
I have this deployed in my environment, however, I am not able to pull metrics from all VMs. I can see couple of user VMs, HCX, and NSX VMs. The environment has around 20+ VMs and I am getting 10 only. I tried giving credentials of AVS directly to the telegaf.conf, even after that, I can't collect the full metrics. I tested a different configuration in telegraf (using vm_metrics_include) and with InfluxDB, I was able to collect all metrics.
Since I am using Azure Managed Grafana, I can't reach the Influx DB over private IP as the Managed Grafana doesn't support managed private endpoint to access VMs. I deployed another Grafana on-premises and with InfluxDB datasource, it works fine. From an observability standpoint, managing two Grafana doesn't make sense. If it comes in Azure Monitor, I can easily parse and transform rather than writing complex InfluxDB queries.
In my current configuration, I hardcoded the AVS resource ID, region, and credentials. Any pointers? When I run telegraf, I can see logs where it states "Found 11 metrics for vm-01" and this vm-01 never reached the CSV.
Any idea how I can fix this?