eclipse / paho.golang

Go libraries
Other
327 stars 92 forks source link

Handle permanent publish queue errors in Autopaho #234

Open vishnureddy17 opened 6 months ago

vishnureddy17 commented 6 months ago

If the queue implementation in autopaho is in a permanently failed state, managePublishQueue() will continue retrying indefinitely.

Should there be some way for queue issues to be detected so autopaho can quit and surface the issue to the user?

MattBrittan commented 6 months ago

If the queue implementation in autopaho is in a permanently failed state,

Could you please provide an example of a failed state? Messages really just move from the queue into the session (except with QOS0 where a failure to transmit over the network would lead to the messages being retried).

I guess we could add a callback that is called before retransmitting a message; this might be useful in other situations (i.e. a message might have a deadline and, should that time pass, it should not be retried).

vishnureddy17 commented 6 months ago

Could you please provide an example of a failed state?

Hypothetically, what if the application is using a file-based queue and the underlying storage medium is disconnected or failed?

Or what if the user has a custom queue implementation that relies on a database connection but a connection is not able to be established?

Maybe the queue interface needs a way to signal a "permanant failure".

MattBrittan commented 6 months ago

Hypothetically, what if the application is using a file-based queue and the underlying storage medium is disconnected or failed? Maybe the queue interface needs a way to signal a "permanant failure".

I'm open to suggestions on this but am not sure how far to go with this; if there is a hardware failure then I think that continually retrying may well be the right approach (when the issue is fixed things will start working again). I guess that adding an error callback might help users detect the issue.

One way that a user could deal with this is to implement their own queue, and handle errors how they see fit; this may mean that if an error is detected Peek returns nil until it's resolved (perhaps Wait would retry every second). I think this might be a better option than us trying to come up with a one-size-fits-all solution within the main library.