eclipse / paho.golang

Go libraries
Other
327 stars 92 forks source link

Incorrect assumption about PUBLISH errors in autopaho can cause indefinite head-of-queue looping #214

Closed vishnureddy17 closed 7 months ago

vishnureddy17 commented 8 months ago

In autopaho, managePublishQueue assumes that if paho.PublishWithOptions() returns an error that is not paho.ErrNetworkErrorAfterStored, the error is either temporary or the connection will drop: https://github.com/eclipse/paho.golang/blob/a6def521ee1aecf0b1cef6bfc1a7022102457807/autopaho/auto.go#L531-L553

However, this is not always the case. All the errors returned in the code snippet below for paho.PublishWithOptions() are neither temporary nor imply a pending disconnection: https://github.com/eclipse/paho.golang/blob/a6def521ee1aecf0b1cef6bfc1a7022102457807/paho/client.go#L748-L762

As a result, managePublishQueue will loop indefinitely on the first entry in the queue in these cases.

Proposed Solution To solve this, I propose creating a custom error type called PahoArgumentError, which will be returned by paho.PublishWithOptions() in these cases (this error type will likely be relevant elsewhere in paho). Then autopaho can do a type assertion and quarantine any publish in the queue that returned PahoArgumentError.

MattBrittan commented 8 months ago

Agreed; I did consider doing something similar when introducing this, but felt that it was OK for an initial PR (hard to know when to stop and submit the PR). My thought was something like paho.FatalError (indicating that it's pointless to retry) because this could also apply if the broker returns something like "Payload format invalid"). Happy with either option (will try to get a bit more work done on this over the next couple of days).

vishnureddy17 commented 8 months ago

That sounds good, I also think it's worth returning such an error from Subscribe and Unsubscribe, which would be used to distinguish between errors that are due to a client failure/disconnection and errors that are due to something fatally wrong with the subscribe/unsubscribe

MattBrittan commented 7 months ago

I've had an initial go at this; went with your solution (it's probably useful to be able to differentiate between a range of "fatal errors"). Ideally this would be extended to look at broker response codes (but I don't think that is as urgent).

vishnureddy17 commented 7 months ago

closing due to #226, @MattBrittan feel free to reopen if you still think it's important to keep around

vishnureddy17 commented 6 months ago

Ideally this would be extended to look at broker response codes (but I don't think that is as urgent).

This is really only doable if #216 is addressed.

Also, even it was possible, I don't think that autopaho should retry if the ack came back with an error reason code. What if we're close to the server's receive maximum or we are about to run out of packet IDs? Then retrying could cause issues.