eclipse / paho.golang

Go libraries
Other
330 stars 92 forks source link

Potential deadlock in managePublishQueue / Consider how `paho.Publish` should handle errors #196

Open MattBrittan opened 10 months ago

MattBrittan commented 10 months ago

Describe the bug

In autopaho managePublishQueue we call PublishWithOptions and if an error is returned wait on the connection to drop before trying again (so we assume that if PublishWithOptions returns an error then it's fatal). This is not a valid assumption because publish may return at other times; for instance if c.PacketTimeout is exceeded.

Solving this is going to require a bit of thought and modifications to paho; consider publishQoS12:

func (c *Client) publishQoS12(ctx context.Context, pb *packets.Publish, o PublishOptions) (*PublishResponse, error) {
    c.debug.Println("sending QoS12 message")
    pubCtx, cf := context.WithTimeout(ctx, c.PacketTimeout)
    defer cf()

    ret := make(chan packets.ControlPacket, 1)
    if err := c.Session.AddToSession(pubCtx, pb, ret); err != nil {
        return nil, err
    }

So if we are attempting to send a lot of messages and c.Session.AddToSession blocks for, by default, 10 seconds then publishQoS12 returns an error without actually making an attempt to send the message (due to c.PacketTimeout). This makes it difficult for the caller to know what to do when it receives an error (should it retry or should it expect the connection to drop due to a protocol error).

I think there is a quick fix and a longer term project here:

Software used:

@master

MattBrittan commented 10 months ago

The fix I have just merged should prevent this from being an issue with autopaho. However it would be good to review this code further and see if there is a better way of handling things (this might include looking at what to do if the client gets stalled due to receive maximum depletion - should we drop the connection if this happens?).