aws / aws-iot-device-sdk-js

SDK for connecting to AWS IoT from a device using JavaScript/Node.js
Apache License 2.0
964 stars 384 forks source link

Connection lost every 2-3 Minutes by some users #280

Closed klotzma closed 5 years ago

klotzma commented 5 years ago

Hello,

I want to reminisce with this Issue again to the now long time existing problem, hoping to find help, because I can use the Iot service very limited.

I have to partially re-order my google home several times to run it, whether via IFTTT or the direct connection to GHome, I quote "The service Iobroker is currently unavailable".

2019-08-11 22:16:54.958 - �[31merror�[39m: iot.0 Error by device connection: "read ECONNRESET" 2019-08-11 22:16:54.958 - �[32minfo�[39m: iot.0 Connection changed: disconnect 2019-08-11 22:16:54.958 - �[32minfo�[39m: iot.0 Connection lost 2019-08-11 22:17:00.268 - �[32minfo�[39m: iot.0 Connection changed: connect 2019-08-11 22:22:00.541 - �[31merror�[39m: iot.0 Error by device connection: "read ECONNRESET"

Same problem as in this closed issue: Aws Iot Issue #251

bretambrose commented 5 years ago

At this point, assuming you've checked all of the suggestions in the referenced thread, I think the best chance for progress is to provide a wire shark log from the initial connection to the first disconnect.

klotzma commented 5 years ago

I'll let the Wireshark run and post the protocol here

klotzma commented 5 years ago

Can someone help me with how I record this with Wireshark?

justinboswell commented 5 years ago

Just open wireshark, and capture with a filter of host <your endpoint>-ats.iot.<region>.amazonaws.com, then run your program. That'll capture only packets going to your IoT endpoint, which is what you want.

klotzma commented 5 years ago

What do you mean with "My Endpoint" ? Where can i find it?

justinboswell commented 5 years ago

The endpoint you were assigned for your Thing in the AWS Console. It should look something like a16523t7iy5uyg-ats.iot.us-east-1.amazonaws.com.

klotzma commented 5 years ago

I recorded the traffic, but it has to be filtered. I did not manage to filter it properly. My Cloud IP = a18wym7vjdl22g.iot.eu-west-1.amazonaws.com Towards the end of the connection abort should be seen. Here is the zip: protocol.zip

iot

iot2

justinboswell commented 5 years ago

That looks like a normal 5 minute timeout hangup from the service, because no packets were sent from the client to the service. If you look at the timestamps, it's exactly 300 seconds.

klotzma commented 5 years ago

But where do the problems come from? Others do not and the service works. I always get errors.

klotzma commented 5 years ago

Are you talking about the pictures now? or did you look at the log? Because there is not much to see in the pictures

klotzma commented 5 years ago

I do not believe the pictures say anything. I just did it to show that it recorded. Because I do not understand anything that is in the record

klotzma commented 5 years ago

Here I once again recorded the Trafic, this time longer. Again, unfiltered, must be filtered out. I can not do that. vm100neu2.pcap.zip

justinboswell commented 5 years ago

I filtered the log you posted down (tcp.port == 8883 works). Have you tried setting keepalive in the options structure you pass to the Device constructor to something like 60 seconds? Do you have logs from the session? Set debug = true on the options passed to the Device constructor, and you should get some logs.

klotzma commented 5 years ago

I detected it via TCPDUMP because I have this system running in a virtual machine under Proxmox. Can I also set something under tcpdump?

justinboswell commented 5 years ago

No, because all of the packets are encrypted with TLS, so we can only guess at their contents by size. Furthermore, we want to see how the SDK is reacting to incoming/outgoing messages.

klotzma commented 5 years ago

And wireshark decodes this when capturing it?

klotzma commented 5 years ago

The iot service runs on my server under a VM and the server is connected via LAN to my router. Wireshark I have on a Windows laptop, how can I record the traffic now in the LAN?

klotzma commented 5 years ago

I have used the "capturing" feature of my router and here is the protocol: !!iad-if-lan_06.09.19_2127.zip Can you find more here?

justinboswell commented 5 years ago

None of what we're asking for is wireshark related. I need the logs from your application, so I can see what the SDK thinks is happening.

klotzma commented 5 years ago

I ran the loglevel on silly and debug and uploaded it here. log.zip iobroker.2019-09-06debug.zip Or do you mean something else? If so, can you help me with what you need, I am not a professional?

klotzma commented 5 years ago

I have now let my router capture the traffic and set the log of the application to debug in parallel.

I would have liked to record longer, but you can only upload files here with 10 mb

graebm commented 5 years ago

The wireshark logs from the router are not very helpful, since the data is all encrypted. You do not need to keep including those.

Have you tried setting a shorteer keepalive value? Like:


var awsIot = require('aws-iot-device-sdk');

var device = awsIot.device({
     debug: true,
   keepalive: 60,
   keyPath: <YourPrivateKeyPath>,
  certPath: <YourCertificatePath>,
    caPath: <YourRootCACertificatePath>,
  clientId: <YourUniqueClientIdentifier>,
      host: <YourCustomEndpoint>,
       ...
});
klotzma commented 5 years ago

OK sorry.

Where do I have to change that "keepalive" ? In the application where the iot service is running?

klotzma commented 5 years ago

The wireshark logs from the router are not very helpful, since the data is all encrypted. You do not need to keep including those.

If you can tell me how to record it unencrypted, I would make it new. Unfortunately, I am only a user and not a developer.

GermanBluefox commented 5 years ago

OK sorry.

Where do I have to change that "keepalive" ? In the application where the iot service is running?

This is a place that should be edited: https://github.com/ioBroker/ioBroker.iot/blob/master/main.js#L767

device = new DeviceModule({
                privateKey: new Buffer(certs.private),
                clientCert: new Buffer(certs.certificate),
                caCert:     fs.readFileSync(__dirname + '/keys/root-CA.crt'),
                clientId,
                username:   'ioBroker',
                host:       adapter.config.cloudUrl,
                debug:      !!adapter.config.debug,
                baseReconnectTimeMs: 5000,
                keepalive: 60, // ADD THIS LINE
            });

the file could be found in /opt/iobroker/node_modules/iobroker.iot/main.js.

Of course you must restart "iot" after the file was changed.

Apollon77 commented 5 years ago

Would be interesting if it helps

klotzma commented 5 years ago

2019-09-10 17:51:07.137 - error: Caught by controller[0]: /opt/iobroker/node_modules/iobroker.iot/main.js:777 2019-09-10 17:51:07.138 - error: Caught by controller[0]: keepalive: 60, 2019-09-10 17:51:07.138 - error: Caught by controller[0]: ^^^^^^^^^ 2019-09-10 17:51:07.138 - error: Caught by controller[0]: SyntaxError: Unexpected identifier 2019-09-10 17:51:07.138 - error: Caught by controller[0]: at Module._compile (internal/modules/cjs/loader.js:723:23) 2019-09-10 17:51:07.138 - error: Caught by controller[0]: at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10) 2019-09-10 17:51:07.138 - error: Caught by controller[0]: at Module.load (internal/modules/cjs/loader.js:653:32) 2019-09-10 17:51:07.138 - error: Caught by controller[0]: at tryModuleLoad (internal/modules/cjs/loader.js:593:12) 2019-09-10 17:51:07.138 - error: Caught by controller[0]: at Function.Module._load (internal/modules/cjs/loader.js:585:3) 2019-09-10 17:51:07.138 - error: Caught by controller[0]: at Function.Module.runMain (internal/modules/cjs/loader.js:831:12) 2019-09-10 17:51:07.138 - error: Caught by controller[0]: at startup (internal/bootstrap/node.js:283:19) 2019-09-10 17:51:07.139 - error: Caught by controller[0]: at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3) 2019-09-10 17:51:07.139 - error: host.iobroker instance system.adapter.iot.0 terminated with code 1 () 2019-09-10 17:51:07.139 - info: host.iobroker Restart adapter system.adapter.iot.0 because enabled 2019-09-10 17:51:37.153 - info: host.iobroker instance system.adapter.iot.0 started with pid 18328

This error comes and the adapter stops working

Apollon77 commented 5 years ago

did you forgot to add a "," at the end of the line before?

klotzma commented 5 years ago

Yes, I forgot a comma. The iot adapter has not been disconnected for 1.5 hours

GermanBluefox commented 5 years ago

Yes, I forgot a comma. The iot adapter has not been disconnected for 1.5 hours

Very interesting....

GermanBluefox commented 5 years ago

@graebm What is the suggested value for keepalive?

Apollon77 commented 5 years ago

@graebm just for my understanding: What is the default when not set? Or is then no "keep alive" is done at all?

klotzma commented 5 years ago

What changed the "keepalive"? So for stupid people, like me. I'm not so familiar with Javascript.

GermanBluefox commented 5 years ago

@graebm just for my understanding: What is the default when not set? Or is then no "keep alive" is done at all?

keepalive: used to specify the time interval for each ping request. Default is set to 300 seconds to connect to AWS IoT.

klotzma commented 5 years ago

Since I added the line "keepalive", there was no disconnect or "econnreset" error in the log anymore.

Apollon77 commented 5 years ago

Coooool ...Then we will prepare a new release with this change soon. In theory it would be intersting which other values would also work, so that maybe the default or the "best practice" could be adoped to prevent such problems also for other users

pedro-brito-automa commented 2 years ago

I dont understand well how the keepalive affect the connection, I was having the same problem here, and it was solved setting the keepalive to 60. is it related to time that server need to maintain the connection alive ? can 300s be more than the server need to work well?