aws / aws-iot-device-sdk-js

SDK for connecting to AWS IoT from a device using JavaScript/Node.js
Apache License 2.0
964 stars 384 forks source link

Publish speed #161

Closed mjmelli closed 6 years ago

mjmelli commented 6 years ago

I'm trying to do some very basic throughput testing on publishing messages to the IoT gateway (and then storing them in DynamoDB). I'm just putting a publish function in a loop.

Our use case will be sending quite a bit of rapid sensor data from a fleet of devices so I'm trying to get a sense of what kind of throughput we can expect.

Right now I'm only able to get it to send out about 3 messages per second. Is this typical? Is there a way to speed this up?

Thanks.

fengsongAWS commented 6 years ago

Hi @mjmelli , You should be able to get more than that. More details of IoT limits are documented here, search for IoT. http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html

mjmelli commented 6 years ago

Hi @fengsongAWS,

Thanks for your response. I have seen the limits and I'm sure I'm nowhere close to hitting any of them, which is why I posted here. My network is stable and fast as well, so I'm really not sure why it's so slow. Any additional help with optimizing for rapid message publishing would be appreciated.

Thanks.

fengsongAWS commented 6 years ago

Hi @mjmelli , Does this happen to other SDK as well or is it specific to Node.js? Can you try other SDK e.g. Python, or embedded C to see how much you can get there?

mjmelli commented 6 years ago

Hi @fengsongAWS ,

Thanks for the suggestion. I don't know Python very well, but I installed the Python IoT SDK and set up a very similar loop to publish as many messages as possible, and saw drastically better performance than the Node SDK. I'm getting over 1000 messages/sec with Python and still can't figure out how to get more than ~3-5/sec with the Node SDK. The code was substantially similar - just a call to connect followed by a loop over a publish call. Both had the same very small JSON string payload and qos=0.

Note that in Python with qos=1, it was substantially slower, but still over 20 messages/sec. Changing qos on the Node publish doesn't seem to have much of an effect on speed.

fengsongAWS commented 6 years ago

Hi @mjmelli , I think you did not wait for the connection to be stable. All the publish you did were push into the queue and being drained sequentially.

You need to wait for the minimumConnectionTimeMs so that the device mark the connection as stable, then you do the batch publish. This should work for you.

mjmelli commented 6 years ago

Hi @fengsongAWS ,

Thanks for this advice. I added a timeout to wait before publishing messages and this did indeed fix my problem, the messages went through very fast.

I see that the default value for minimumConnectionTimeMs is 20 seconds, but I tried it with a wait of just 1 second and that also seemed to work. Is there a value that can be considered "safe" for a real-world application?

Our IoT device will be deployed in consumer's homes, and this seems like a really finicky way to handle this. How can we ever be sure that the connection is "stable" other than just waiting for it? How can we test for an "unstable" connection?

What happens if the connection is dropped and reconnects, is this also a case where we will then have to wait to start re-publishing?

Thanks for your insight.

fengsongAWS commented 6 years ago

Actually, I was not accurate about my previous statement. In your case, you do not necessarily wait for minimumConnectionTimeMs because as long as it is not inactive, you are good to go. The real problem here is that you need to wait for the drainer to finish which scheduled to start and be cleared after 250 ms (by default) once connected. And to have this in the fastest performance, please do not do any pub/sub before connect because this will put all of them into the offlinequeue which are waited to be drained.

mjmelli commented 6 years ago

Thanks @fengsongAWS . This is helpful and waiting 250ms after connect seems to also work.

Thanks for your help with this!

mlfarrell commented 5 years ago

This needs to be FAR better documented. The penalty for not waiting fo the min duration is catastrophic. I almost posted a duplicate of this issue today.