bryanklewis / prometheus-eventhubs-adapter

Use Azure Event Hubs as a remote storage for Prometheus
Apache License 2.0
10 stars 12 forks source link

event hub adapter intermittently throwing authorization error #31

Closed kalyanteja19 closed 11 months ago

kalyanteja19 commented 12 months ago

We are getting the below error intermittently, through connection string is correct. This is observed on all the AKS clusters.

Can you please assist what could be the issue here.

Environment: AKS Prometheus: 2.46 Eventhub adapter: 0.5.0

Error:

{"level":"info","version":"v0.5.0","commit":"bfe73d16928d35c01f4daec480f6e791d6a67985","build":"20220314.5","timestamp":"2023-11-08T18:14:52Z","message":"prometheus-eventhubs-adapter starting"}

{"level":"error","error":"While parsing config: (1, 1): parsing error: keys cannot contain \u007f character","timestamp":"2023-11-08T18:14:52Z","message":"Error loading config file"}

{"level":"info","timestamp":"2023-11-08T18:14:52Z","message":"listening and serving HTTP on 0.0.0.0:9201"}

{"level":"error","error":"received detach frame link detached, reason: *Error{Condition: amqp:link:detach-forced, Description: The link '8KCg0rM_H08' is force detached. Code: RenewToken. Details: Unauthorized access. 'Send' claim(s) are required to perform this operation. Resource: 'sb://xxxxxxxxxx.servicebus.windows.net/eh-xxxxxxxxxxxxx'.. TrackingId:475b57d995854758ad25bf8a6c194ce7_G1S3, SystemTracker:gateway5, Timestamp:2023-11-09T21:54:58, Info: map[]}","timestamp":"2023-11-09T21:54:58Z","message":"send event"}

{"level":"info","timestamp":"2023-11-09T21:54:58Z","message":"Resetting EventHub Configuration"}

kalyanteja19 commented 12 months ago

@bryanklewis, Could you please help here?

bryanklewis commented 12 months ago

Some light googling says thats a generic JSON/TOML error related to a malformed document. I see it across a lot of projects that parse json, i think we could start there. Can you post your config file (obscure the conn string parts) to see if anything is missing? Maybe hidden character or something like that. ill keep looking to see if the viper framework has anything specific related.

kalyanteja19 commented 12 months ago

Some light googling says thats a generic JSON/TOML error related to a malformed document. I see it across a lot of projects that parse json, i think we could start there. Can you post your config file (obscure the conn string parts) to see if anything is missing? Maybe hidden character or something like that. ill keep looking to see if the viper framework has anything specific related.

Thank you so much @bryanklewis for quick reply. I am facing authorization issue to view /etc/prometheus-eventhubs-adapter/prometheus-eventhubs-adapter.toml file inside event hub adapter pod.

But here are the values that we are passing to EventHub adapter. Kindly advise on this issue.

ADAP_LISTEN_ADDRESS: 0.0.0.0:9201 ADAP_LOG_LEVEL: info ADAP_PARTITION_KEY_LABEL: name ADAP_READ_TIMEOUT: 10s ADAP_WRITE_BATCH: "true" ADAP_WRITE_PATH: /write ADAP_WRITE_RAW: "true" ADAP_WRITE_SERIALIZER: json ADAP_WRITE_TIMEOUT: 20s ADAP_WRITE_CONNSTRING=Endpoint=sb://xxxxxx.servicebus.windows.net/;SharedAccessKeyName=xxxxxxxx;SharedAccessKey=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx;EntityPath=xxxxxxxxxxxxxxxxxxxx

bryanklewis commented 12 months ago

Does putting double (") or single (') quotes around your ADAP_WRITE_CONNSTRING value make any difference? Such as ADAP_WRITE_CONNSTRING="Endpoint=sb://xxxxxx.servicebus.windows.net/;SharedAccessKeyName=xxxxxxxx;SharedAccessKey=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx;EntityPath=xxxxxxxxxxxxxxxxxxxx"

?

kalyanteja19 commented 12 months ago

Does putting double (") or single (') quotes around your ADAP_WRITE_CONNSTRING value make any difference? Such as ADAP_WRITE_CONNSTRING="Endpoint=sb://xxxxxx.servicebus.windows.net/;SharedAccessKeyName=xxxxxxxx;SharedAccessKey=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx;EntityPath=xxxxxxxxxxxxxxxxxxxx"

?

Yep, we too though that. But the above-mentioned behaviour is not consistent, and the authorization issue is intermittent.

Here are the steps/details:

  1. Prometheus and event hub adapter deployed onto AKS cluster.
  2. For some time, no errors observed in event hub adapter and Prometheus.
  3. After some time, this authorization error will pop-up in event hub adapter and Prometheus also shows connectivity errors to Event Hub.
  4. Then we had to restart Prometheus and then it will work for some time. Again, issue persists after some time.
bryanklewis commented 12 months ago

oh that helps a lot. Re-reading your original error along with that i think what you are seeing are a result of the amqp client disconnect/reconnect process. While kind of verbose, my question is are you actually seeing a stop in data flow, or do the messages just get logged and the adapter continues to send data? In other words, is it just spitting out log errors, but continuing to function?

kalyanteja19 commented 12 months ago

oh that helps a lot. Re-reading your original error along with that i think what you are seeing are a result of the amqp client disconnect/reconnect process. While kind of verbose, my question is are you actually seeing a stop in data flow, or do the messages just get logged and the adapter continues to send data? In other words, is it just spitting out log errors, but continuing to function? @bryanklewis Some metrics are missing. Once we restart Prometheus, again metrics are working. Then after some time event hub adapter will start showing error again.

bryanklewis commented 11 months ago

Ok, try 2 things

  1. Add quotes around all your config values just to eliminate any parsing error
  2. Use the latest release v.0.5.3, ive update the amqp connection library which in turn updates the auth library from v3 to v4. Lets see if the newer SDK library contains better connection logic.
kalyanteja19 commented 11 months ago

Ok, try 2 things

  1. Add quotes around all your config values just to eliminate any parsing error
  2. Use the latest release v.0.5.3, ive update the amqp connection library which in turn updates the auth library from v3 to v4. Lets see if the newer SDK library contains better connection logic.

@bryanklewis , Thank you so much. We will try upgrading and update you. Please be informed that our deployment process will need sometime. I will try in some sandbox and update here.

kalyanteja19 commented 11 months ago

@bryanklewis , This is taking time due to other some changes. Hence please close this issue. Thank you so much for your support :)