Azure / azure-iot-sdk-node

A Node.js SDK for connecting devices to Microsoft Azure IoT services
https://docs.microsoft.com/en-us/azure/iot-hub/
Other
262 stars 227 forks source link

sendEvent message size/quota exceeded errors not reported #501

Closed bragma closed 3 years ago

bragma commented 5 years ago

Context

Description of the issue:

mqtt.sendEvent does not call the specified callback if the message size is larger than the max accepted by IoT Hub. In my code I am using a race between sendEvent (promisified) and a timeout promise. I've noticed that in some cases it always times out and found the root cause to be sendEvent not notifying anything if the supplied message is too large. This should never happen as it can break external logic.

Notice that NoRetry should not be involved, since this sent attempt will never succeed.

UPDATED: This also seems to happen if the Hub daily quota of messages has been exceeded.

Code sample exhibiting the issue:

'use strict';
var connectionString = 'MYHUBCONNECTIONSTRING';

var Mqtt = require('azure-iot-device-mqtt').Mqtt;
var DeviceClient = require('azure-iot-device').Client;
var Message = require('azure-iot-device').Message;

var client = DeviceClient.fromConnectionString(connectionString, Mqtt);

// Print results.
function printResultFor(op) {
  return function printResult(err, res) {
    if (err) console.log(op + ' error: ' + err.toString());
    if (res) console.log(op + ' status: ' + res.constructor.name);
  };
}

var msgCount = 240;

// Create a message and send it to the IoT hub every second
setInterval(function(){
  var data = JSON.stringify({ data: 'x'.repeat(1024 * msgCount) });
  console.log("Message length: ", data.length);

  var message = new Message(data);

  client.sendEvent(message, printResultFor('send'));

  msgCount++;
}, 1000);

Console log of the issue:

C:\Devel\iot>node send-size.js
Message length:  245771
send status: MessageEnqueued
Message length:  246795
send status: MessageEnqueued
...
Message length:  260107
send status: MessageEnqueued
Message length:  261131
send status: MessageEnqueued
Message length:  262155 <-- NO MORE CALLBACKS
Message length:  263179
Message length:  264203
Message length:  265227
Message length:  266251

AB#7366638

pierreca commented 5 years ago

Thanks for reporting this. it definitely sounds like a bug. We'll try to set up a repro and comment if we have more questions.

pierreca commented 5 years ago

On initial investigation it looks like the service bluntly disconnects the client when the message is too big, without giving a proper error. At that point the retry logic kicks in, which is why you're not getting notified until much later (probably around 4 or 5 minutes later).

When using MQTT I'm not sure there is a mechanism for the service to notify the client which type of error is causing the disconnection. I need to dig into it a little bit more to figure out if there's anything "hidden" in the MQTT library that we should pay attention to to help us detect that (either an error, or some client-side validation...).

we generally shy away from client-side validation because it's a very good way to have the SDK and the service capabilities diverge. This being said nothing prevents you from testing this on your end.

I will have a chat with the service team to see if this is something we can address and update this issue accordingly.

bragma commented 5 years ago

Thanks. Considering how often services are updated, it's ok to try to avoid client-side checks, but then server-side errors must be properly reported. If this is not possible, it will be in some cases force the client to check that. In this specific case, no retry will fix it until something times out. In our code, where retry is disabled and sendEvent races with our timeout timer, we totally missed the problem until we noticed the missing data on the server side. Some way of detecting the error could have helped out a lot.

pierreca commented 5 years ago

Considering how often services are updated, it's ok to try to avoid client-side checks, but then server-side errors must be properly reported. If this is not possible, it will be in some cases force the client to check that

I agree with that and just had a talk with the service team. There's no way for them to signal us what error caused the disconnect, which means we need a client side check. I also looked at mqtt.js features and it doesn't seem like they have a mechanism to check/limit message size.

To get around the problem you faced, we're going to have to build it directly into the SDK itself. I'll add this to our backlog and check if other SDKs do it.

pierreca commented 5 years ago

FYI I'm working on a small PR on the mqttjs/mqtt-packet project. This is where the code that compute the packet size resides and I think it'd be dangerous to duplicate this in our SDK. I'm going to see if they would be OK with accepting an additional feature that allows for client-side packet size checking.

One of the reason they may not accept this is that it collides with new MQTT 5 features that allow this sort of limitations as part of the protocol, when the device first connect. MQTT 5 also has error codes in the PUBACK which would allow to return an error type. Therefore there are no guarantees that they will accept this PR. if they refuse, we can always reconsider adding packet-size computation into the SDK

Last but not least, a friend from the service team suggested to use Azure Monitoring to check for errors on the service side. A device probably wouldn't connect to Azure Monitoring but your service application could and then it could diagnose why a device is failing to send messages (doesn't help the dev experience on the device but at least offers the IoT solution developer an option).

By the way, quota exceeded is an error that the SDK won't be able to cover and for which the service will just disconnect the device. This is the appropriate service behavior in MQTT (see the last line of section 3.3.5 of the MQTT specification)

All in all I guess we just have to live with the limitations of the transport protocol. AMQP is a tad more complicated and heavy weight but does enable richer error handling.

pierreca commented 5 years ago

update - the fix is bigger than expected (needs at least 3PRs in different repos) so I'm considering a different way of doing it (ie expose an API to compute the packet size on mqtt-packet rather than bubble up errors and add configuration APIs on mqtt.js)

anthonyvercolano commented 3 years ago

This is an issue that we won't fix in the current implementation of the SDK.

anthonyvercolano commented 3 years ago

Does not actually seem closed. Trying again