eclipse / paho.mqtt.javascript

paho.mqtt.javascript
Other
1.15k stars 467 forks source link

Client Timing Out After 2 Minutes #106

Closed MattyK14 closed 6 years ago

MattyK14 commented 7 years ago

Currently using the react-native fork to connect to AWS IoT. After every 2 minutes of a connection, the client times out even if there are messages coming in or being published. In the connectionLost handler, if the error type is timeout I force it to reconnect but after 3 more reconnections it becomes Error: AMQJS0007E Socket error: Unknown socket error.

@rh389

jpwsutton commented 7 years ago

Can you recreate this using the pure client library?

robhogan commented 7 years ago

Just on my phone so can't properly investigate right now but this could well be an issue with my fork as I haven't done much testing with the keep-alive mechanisms. Do you have a keepAliveInterval configured on the client, and the equivalent on the server?

Failing to reconnect after 3 timeouts is odd in any case though.

MattyK14 commented 7 years ago

I was on the default keepAliveInterval of 60 seconds. After messing around with it myself, seems no matter what I set it to every second keepAlive ping will cause it to timeout. So if I set it to 5 seconds, it times out after 10.

The failure to reconnect is odd. If I set the keepAliveInterval to 15 seconds, it will fail to reconnect on the 10th try. It seemed consistent yesterday on 60 seconds to fail on the 5th attempt to reconnect, but now the last two times it has failed on the 3rd attempt.

robhogan commented 7 years ago

@MattyK14 - I've just spotted that a fork of my fork by @clshortfuse has what looks like a better timer implementation (looks like he found a bug or two as well as using background timers). Perhaps you could give his fork a try?

@clshortfuse - mind if I pull in your commits if all goes well?

(PS: I've now enabled issues on my fork so any future issues specific to my RN version can be opened directly there)

clshortfuse commented 7 years ago

Go for it. I didn't have time to wrap it up, but it's working fine in my end so far. I did change the ping system IIRC, because I was having issues with the dual pinger system, so you have to watch for breaking changes.

The specific reason for changing to native timers was because the JS stack seems to pause when the activity is pause. On ChromeOS, the JS timers wouldn't fire unless the activity had window focus.

robhogan commented 7 years ago

Cheers. I think the rationale behind two timers is:

  1. By the spec, the client is responsible for sending a control packet (ping or otherwise) every keepAliveInterval seconds, otherwise the server disconnects us (after 1.5 * keepAliveSeconds). So we use sendPinger to track when we've been quiet long enough that we must ping the server.

  2. If the server has been quiet for a while, we choose to use the receivePinger to make sure it's still there. If we don't get a response to our ping we disconnect. As far as I can tell, this is additional to the spec. The only use I can think of is to ensure that we close the socket (and so prepare for a reconnect) in cases where the socket is still functional but the server isn't all there.

Right now though it looks like the receivePinger is never reset, either in my fork or in the eclipse original. Hence https://github.com/clshortfuse/react-native-paho-mqtt/commit/b94ef43933b7860ed5800511615384bf776af996#diff-1983c3869382e68a08044cf44a806a41 presumably. But even then, we're not acting on a lack of response.

None of this really explains the problem @MattyK14 is seeing though ;). What server are you using @MattyK14? Are the messages getting through (both ways?) until the disconnect?

Edit: The receivePinger is never reset explicitly, but of course it resets itself every time it sends a ping, so in effect it looks like it's just a continuous pinger, which makes the sendPinger redundant.

@jpwsutton - any idea what the intention was here? It doesn't make sense to me.

clshortfuse commented 7 years ago

I'm actually using the react native fork by @rh389 for the same reason @MattyK14 is, namely, for Amazon IoT.

I'm more interested the actual error code returned by connectionLost. I know that if you try to perform an action not permitted by your Amazon IoT Policy, it will boot you off the Mqtt connection. Though, usually, you'll get a specific error code.

Also, are you sure you are randomizing your Client IDs properly? I believe Amazon will also boot an Mqtt session if somebody else connects with the same Client ID. My bet is on that.

Just for reference, this is the configuration I have working (with a few minor code changes):

var clientId = 'myappname-client-' + (Math.floor((Math.random() * 1000000) + 1));
console.log('Connecting to MQTT with client', clientId);
var client = new MqttClient({
  uri: url,
  clientId: clientId,
  storage: storage
});

client.on('connectionLost', (error) => {
  switch (error.errorCode){
    case 0:
      return;
    default:
    case 1:
    case 4:
    case 8:
      console.log('###CONNECTION LOST###', error);
      this.disconnectMqttClient(connectionId).then(() => {
        this.events.emit('mqttClientConnectionLost', connectionId);
      });
  }
});

client.on('messageReceived', (msg) => {
  console.log('message received', msg.destinationName);
  console.log(this.mqttClients[connectionId].callbacks);
  var array = this.mqttClients[connectionId].callbacks[msg.destinationName];
  if (Array.isArray(array)) {
    array.forEach(cb => cb(msg));
  }
});

let options = {
  useSSL: true,
  mqttVersion: 4,
  cleanSession: true,
  keepAliveInterval: 15,
  timeout: 15000
};

console.log('Performing MQTT Connect');
return client.connect(options)
  .then(() => {
    console.log('Connected MQTT');
  });

Edit: As for the pinger changes, I could see how there could be a flaw where if a client is getting a stream of incoming packets, and never sends a response, the server could think it's no longer there. From the client-side it knows the connection is alive, but not server-side.

MattyK14 commented 7 years ago

@clshortfuse I'm currently generating a random UUID for the Client Id. It times out at 2x the keepInterval. When I get the timeout error I get it to reconnect, after a few more timeouts I get Error: AMQJS0007E Socket error: Unknown socket error. It seems to be client side and not because of IoT. Messages are successfully being published to topics and received using the AWS console.

@rh389 Yes messages are successfully going both ways until the disconnect!

I will try the fork hopefully later this week. Thanks for the input guys.

MattyK14 commented 7 years ago

After using @clshortfuse's fork I'm not getting timeouts anymore, but AWS IoT is closing the socket after 1.5x the keepAliveInterval as seen in the documentation if a publish is not made.

I don't have a chance to pick through the source code, but I guess it's not sending the ping messages?

robhogan commented 7 years ago

This can be closed, I'm pretty sure it was just a RN fork issue - it's covered by https://github.com/rh389/react-native-paho-mqtt/issues/4 and now fixed.