charlie-haley / omada_exporter

Prometheus Exporter for TP-Link Omada Controller SDN.
MIT License
94 stars 16 forks source link

fix: don't try to return metrics after collection error #59

Closed patrickeasters closed 1 year ago

patrickeasters commented 1 year ago

I discovered a bug where the exporter can panic when it fails to poll metrics from the Omada controller API. I discovered this when running on a Raspberry Pi that had a short network blip.

{"level":"error","error":"Get \"https://192.168.1.2/52727821f5fde36cb714b01a01275f7c/api/v2/loginStatus\": dial tcp 192.168.1.2:443: connect: network is unreachable","time":"2023-01-16T19:54:08-05:00","message":"Failed to get controller"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x42704c]
goroutine 896 [running]:
github.com/charlie-haley/omada_exporter/pkg/collector.(*controllerCollector).Collect(0x1aa8580, 0x187e080)
/Users/patrick/git/omada_exporter/pkg/collector/controller.go:32 +0xcc
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
/Users/patrick/go/pkg/mod/github.com/prometheus/client_golang@v1.9.0/prometheus/registry.go:446 +0x118
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
/Users/patrick/go/pkg/mod/github.com/prometheus/client_golang@v1.9.0/prometheus/registry.go:538 +0x9f8

It appears that in every case where API client Get methods return an error, that it also returns nil data. Rather than the collectors trying to dereference a nil pointer, it seems to make more sense to return without metrics in this case. This allows for the process to remain running and recover after facing an error that makes the Omada API temporarily unavailable.

charlie-haley commented 1 year ago

Thanks for the PR! @patrickeasters