Open alexg-axis opened 1 year ago
It seems to happen on a weekly basis. It could mean that Azure has some sort of timeout for 7 days and that we should gracefully reconnect when it occurs.
Some information from the Python library.
All Event Hubs exceptions are wrapped in an [EventHubError][EventHubError]. They often have an underlying AMQP error code which specifies whether an error should be retried. For retryable errors (ie.
amqp:connection:forced
oramqp:link:detach-forced
), the client libraries will attempt to recover from these errors based on the [retry options][AmqpRetryOptions] specified when instantiating the client. To configure retry options, follow the sample [Client Creation][ClientCreation]. If the error is non-retryable, there is some configuration issue that needs to be resolved.
We believe the following code is the cause - once a link is detached, there's no retry to get a session and link going again.
https://github.com/amenzhinsky/iothub/blob/master/iotservice/client.go#L171-L189
Note how, upon an error when putting a token, we just return and won't try any more. Likely, we become unauthorized and kicked from the server and the link becomes detached.
I have an issue where I'm unable to publish events. Unfortunately I can't identify any more related circumstances than that. It has occurred some times, but in most cases it works as expected.
In essence the code works as follows:
The error is the following:
The Java SDK seems to have this comment regarding the error:
Same with the JS one: https://github.com/Azure/amqp-common-js/blob/master/lib/errors.ts#L171.
So to me it seems as if this error may occur from time to time. For me, it has always been solved with a restart, so I assume one way to handle it is to simply reconnect the client.