bryanklewis / prometheus-eventhubs-adapter

Use Azure Event Hubs as a remote storage for Prometheus
Apache License 2.0
10 stars 12 forks source link

Reset EventHub configuration in case of error #24

Closed aittam closed 2 years ago

aittam commented 2 years ago

Re-configure the EventHub connection in case of error. EventHub sporadically changes its IP address and without a reconfiguration we will get stuck at connecting to the wrong IP.

We encountered this problem a bunch of times on an EventHub with auto inflate enabled. After some weeks of working correctly all ours prometheus-eventhubs-adapter pods were stuck with this error:

{"level":"error","error":"read tcp 10.240.0.13:41830->20.42.68.88:5671: read: connection reset by peer","timestamp":"2021-12-06T10:10:52Z","message":"send event batch"} {"level":"error","error":"read tcp 10.240.0.13:41830->20.42.68.XX:5671: read: connection reset by peer","timestamp":"2021-12-06T10:10:52Z","message":"send event batch"} {"level":"error","error":"read tcp 10.240.0.13:41830->20.42.68.XX:5671: read: connection reset by peer","timestamp":"2021-12-06T10:10:58Z","message":"send event batch"} {"level":"error","error":"read tcp 10.240.0.13:41830->20.42.68.XX:5671: read: connection reset by peer","timestamp":"2021-12-06T10:10:58Z","message":"send event batch"} {"level":"error","error":"read tcp 10.240.0.13:41830->20.42.68.XX:5671: read: connection reset by peer","timestamp":"2021-12-06T10:11:02Z","message":"send event batch"} {"level":"error","error":"read tcp 10.240.0.13:41830->20.42.68.XX:5671: read: connection reset by peer","timestamp":"2021-12-06T10:11:02Z","message":"send event batch"}

We discovered that EventHub had changed IP and we were pointing to the old one.

The Idea in this PR is to force a new configuration in the EH client in order to make it resolve again the DNS name in the connection string.

bryanklewis commented 2 years ago

Looks similar to this issue with the SDK itself, not seeing a resolution in this thread. https://github.com/Azure/azure-event-hubs-go/issues/80

aittam commented 2 years ago

Looks similar to this issue with the SDK itself, not seeing a resolution in this thread. Azure/azure-event-hubs-go#80

Yep, looks a similar issue. Though it has been closed years ago. From the comments and my test doesn't seem to be closed and the approach in the first comment looks very much alike what I am doing here.