Azure / azure-iot-sdk-csharp

A C# SDK for connecting devices to Microsoft Azure IoT services
Other
470 stars 492 forks source link

[Technical Question] Device sometimes stops receiving CloudToDeviceMessages #3365

Open Skaanning opened 1 year ago

Skaanning commented 1 year ago

Context

Description of the issue

Hi Every once in a while, one of my devices stops reacting to "Cloud to device messages". They just keep accumulating, so looking at device twin - "cloudToDeviceMessageCount", it will just keep getting bigger and bigger.

Normally the devices will happily receive and process the messages, but yeah, sometimes they just stop. Only way i can get it back to normal is to restart the device (I can do that with Direct Method Calls, they continue to work even when cloud to device messages doesn't). After it reconnects to the iothub again, it will start handling the messages.

I wonder if I do something wrong in the setup of the deviceclient.

The relevant code looks something like this.

NOTE: All the methods setup in SetMethodHandler continues to work, even when the MessageHandler no longer does. NOTE2: Maybe the ReceiveMessageHandler gets lost if the device disconnects, and I should rerun this SetReceiveMessageHandlerAsync whenever the device reconnects in the ConnectionStatusHandler?


// On device startup 
// lots of things happening - then at some point
var clientOptions = new ClientOptions
        {
            SdkAssignsMessageId = SdkAssignsMessageId.WhenUnset,
            FileUploadTransportSettings = new Http1TransportSettings
            {
                ClientCertificate = auth.Certificate
            },
        };

var deviceClient = DeviceClientWrapper.Create(registrationResult.AssignedHub, auth, TransportType.Mqtt, _logger, clientOptions);
var iotDeviceClient = CreateCloudClient(deviceClient);
await iotDeviceClient.SetupDeviceClientCallbacks();

// device just runs in a loop after this, only reacting on the callbacks/messages received and messages from an internal mqtt server
public async Task SetupDeviceClientCallbacks()
{
        _deviceClient.SetConnectionStatusChangesHandler(ConnectionStatusHandler);

        await _deviceClient.SetDesiredPropertyUpdateCallbackAsync(UpdateDesiredState, _config);

        foreach (var (name, callback) in Callbacks)
        {
            await _deviceClient.SetMethodHandlerAsync(name, callback, _config);
        }

        await _deviceClient.SetMethodDefaultHandlerAsync(FallbackHandler, null);

        await _deviceClient.SetReceiveMessageHandlerAsync(SetReceiveMessageHandlerAsync, _config, _cts.Token);
        await _deviceClient.OpenAsync(_cts.Token);
}

async Task SetReceiveMessageHandlerAsync(
        Message message,
        object userContext)
    {
        try
        {
            message.Properties.TryGetValue("msgType", out var msgType);
            msgType ??= string.Empty;

            IMsgCommand? command = GetMessageFromType(msgType);

            if (command is null)
            {
                await _deviceClient.CompleteAsync(message);
                _logger.Warning("Rejected message, msgType=[{MessageType}] did not match any known message type", msgType);
                return;
            }

            _logger.Information("Received cloud-message, msgType=[{MessageType}], executing now", msgType);
            await _deviceClient.CompleteAsync(message);

            using var streamReader = new StreamReader(message.BodyStream);
            var msg = await streamReader.ReadToEndAsync();

            await command.Execute(msg);
        }
        catch (Exception e)
        {
            _logger.Error(e, "receive message failed");
            await _deviceClient.CompleteAsync(message);
        }
    }

Anyways, hope you can help. If you need any additional info, please let me know

remcoros commented 7 months ago

Did you ever find out what is happening here?

We are running into this as well. Despite having implemented reconnect logic like in the sample, it seems at some point, some devices just stop responding to or receiving direct method calls.

We added logs all over the place, inside "ConnectionStatusChangeHandlerAsync" to log ALL the status changes, and inside all receive handlers.

When this happens, there is nothing logged about connection status changes, but direct method calls seem to just stop working.

When querying this device from the Azure portal, it says it is still connected and nothing points to any connection issues.

edit: I'm re-reading your report not. Our issue is actually with direct method calls, not cloud2device messages.

Skaanning commented 7 months ago

I haven't had issues with direct method calls. But other than it being direct method calls instead of c2d messages, it sounds like the same underlying issue - or at least the same sort of behaviour.