Starttoaster / proxmox-exporter

A Prometheus metrics exporter for Proxmox VE
MIT License
20 stars 0 forks source link

Templates are counted as guests that seems to be down #40

Closed ThomasWetzel closed 2 months ago

ThomasWetzel commented 4 months ago

Hi,

thank you for your great work.

I recognized that a proxmox template is counted as down.

Perhaps you can exclude the templates?

Thank you Thomas

Starttoaster commented 4 months ago

Hey Thomas, thanks for the kind words. Yeah, I actually recently noticed that myself and was thinking a bit about it. I guess from the perspective of Proxmox, a template isn't much different from a stopped VM. But I think I personally agree with treating templates as different from stopped VMs, and I can see it being problematic with alerting rules anyway. So I'll make the exporter exclude them, thanks for the Issue!

Starttoaster commented 4 months ago

Somewhat surprisingly, I'm not seeing an identifier in the API for a VM instance that specifies whether it's a template or a normal VM https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/qemu

I'll keep looking around the documentation, I would assume Proxmox exposes that information in the API somewhere. But if it requires an additional API request per-VM to discover if it's a template, I might choose to make it opt-in via a flag or environment variable.

Starttoaster commented 4 months ago

Ah, on their /cluster/resources endpoint they expose that information (it's actually not documented but for qemu resources there's a secret template field that they toggle between 1 and 0, for template and not-template respectively.) https://pve.proxmox.com/pve-docs/api-viewer/index.html#/cluster/resources

I can probably carve some time to add that API request in tomorrow. This seems to be one API request per page-load, so not too expensive to make the default here.

Starttoaster commented 4 months ago

I believe the latest version of the exporter fixes the behavior for templates showing up as stopped VMs. Let me know if you get a chance to update and can confirm.

That fix only works for clustered PVE nodes though because the API endpoint I found that clarifies if a VM is a template was a cluster-level one. I was looking for a node-level endpoint but hadn't found one at a glance. Probably take another peak later.

ThomasWetzel commented 4 months ago

image

Hi,

I've updated the helm chart to the latest release.

This is a part of the proxmox response:

    {
        "maxdisk": 2361393152,
        "diskwrite": 0,
        "netin": 0,
        "uptime": 0,
        "netout": 0,
        "name": "ubuntu22-cloud",
        "template": 1,
        "maxmem": 2147483648,
        "diskread": 0,
        "cpu": 0,
        "status": "stopped",
        "serial": 1,
        "vmid": 8100,
        "disk": 0,
        "cpus": 2,
        "mem": 0
    }

The template is set to 1 so this should not be counted. But somehow this is guest is still counted as down.

How can I help you to find out what might be wrong?

Thomas

ThomasWetzel commented 4 months ago

Hi,

the problem is solved. It was a matter of the visualization in Grafana.

ThomasWetzel commented 4 months ago

The prometheus query for guests that are down should be

clamp_min(sum(proxmox_guest_up{cluster=~"$cluster",type="qemu"} == 0), 0)

Starttoaster commented 3 months ago

Hey, thanks for bearing with me on the grafana part! I missed the re-open of this Issue for a bit. And when I found it, I was in the middle of some pressing issues at work so I put it off until a more calm time. I updated the grafana json in main with your version of the query. In your opinion, does this same feedback apply for any of the other stats at the top of the dashboard like down LXCs? I don't really use LXC at all.

Starttoaster commented 3 months ago

Hey, sorry for the time lapsed between your reply and my fix! Since you hadn't replied to this Issue I'm going to assume this is fully resolved for you. Please open up a new Issue or comment back/re-open this one if you have any other suggestions or issues with the exporter :)

ThomasWetzel commented 3 months ago

Hey, I have two Proxmox clusters running. In my second cluster I have LXCs both running and powered down.

Is it possible to somehow configure more than one Proxmox cluster? I'd be happy to share my knowledge if your JSON is working with the LXCs, too.

Starttoaster commented 3 months ago

Is it possible to somehow configure more than one Proxmox cluster?

I think configuring this to monitor multiple clusters may be more code overhead than is worth to maintain, to be honest. I would suggest running multiple installations of this exporter - one for each Proxmox cluster.