Azure / azure-iot-sdk-csharp

A C# SDK for connecting devices to Microsoft Azure IoT services
Other
464 stars 493 forks source link

Device messages with DTDL component name fail over HTTP #1895

Closed bjorn-jarisch closed 3 years ago

bjorn-jarisch commented 3 years ago

Context

Description of the issue

HttpTransportHandler fails to translate message.SystemProperties["dt-subject"] because of missing mapping in CustomHeaderConstants, so any telemetry messages from devices following IoT Plug and Play conventions (i.e. setting the "dt-subject" system property) fail to be sent over HTTP.

drwill-ms commented 3 years ago

Thanks Bjorn. We see your submitted PR (thanks for that) which looks reasonable. We are evaluating it.

drwill-ms commented 3 years ago

@bjorn-jarisch upon further review, we remembered that PnP is officially not supported over HTTP. Even if adding this header sends the component name correctly, there is still not a way to send the device's model Id upon connect, so the overall scenario would be broken.

If I've misunderstood the scenario, please let me know.

The service team shared this documentation where it states:

IoT Plug and Play devices must use MQTT or MQTT over WebSockets. Other protocols such as AMQP or HTTP are not valid to implement IoT Plug and Play devices.

If you'd like to formally request this support, you can reach the IoT Hub service team here.

I'll wait for your response before we close this issue and the corresponding PR to ensure we're on the same page.

bjorn-jarisch commented 3 years ago

@drwill-ms I would argue that throwing a KeyNotFoundException is not a great way to indicate that IoT Plug and Play is not supported over HTTP, and that implementing this minimal fix so that the client does not throw an exception just because a ComponentName is set is a better user experience, especially if the ComponentName is set inadvertently, or for another purpose (for example, we re-used iothub-message-schema before dt-subject came along).

As for our scenario: We are injecting telemetry on behalf of an IoT PnP device using an IoT Bridge (such as this one https://docs.microsoft.com/en-us/azure/iot-central/core/howto-build-iotc-device-bridge). This requires us to a) set the dt-subject so that the injected telemetry is associated with the correct component and b) connect using HTTP to be able to reuse connections at scale.

This scenario works 100% using an internal build of the Microsoft.Azure.Devices.Client package with the suggested fix applied, and is crucial to the functionality of our Bridge. If the PR is rejected we will have to continue using an internal build of the client, but I would really like to avoid that.

drwill-ms commented 3 years ago

@bjorn-jarisch you are 100% right - we have a lousy experience when trying to use the SDK in a way the service doesn't support a combo of settings. A KeyNotFoundException is a terrible way to find out. We should have considered that when we added PnP support but I suspect we missed trying that scenario.

I'm not opposed to taking the change to at least get rid of the exception, however I'm still worried you're going to run into other problems trying to get this to work and I want to prepare you for that. A device must have connected at least once over MQTT and specified its model Id, otherwise it will not be seen as a PnP device. If all device communication occurs over this HTTP bridge, how will that work?

The .NET client supports multiplexing connections over AMQP, which is used by customer-created device gateways. Note, this is how Edge works whether present as a simple gateway or as a full-fledged Edge device with modules and such. We added support for PnP to AMQP just for .NET client because of this scenario with Edge and multiplexing clients. Might this work better for your use case?

drwill-ms commented 3 years ago

@bjorn-jarisch we're looking into the help topic you linked and working with the IoT Central team to understand how they intended this to work with components. I promise we'll either add the support you need to this SDK, or find a viable solution for you.

bjorn-jarisch commented 3 years ago

@drwill-ms thanks for looking into this. We are ever only injecting additional telemetry for a device after it has been provisioned at an IoT Hub (before that we wouldn't even know where to send the telemetry), from the point of view of the Hub everything we do is fine (apart from the HTTP transport bit). Also, to elaborate, most of the telemetry is coming from the device itself, we're only adding some stuff that must be done in the cloud (location triangulation, data usage lookup etc.).

drwill-ms commented 3 years ago

@bjorn-jarisch I talked with Ian Hollier from the IoT Central team. He's familiar with you and your efforts, and provided some helpful background.

We're going to go ahead and accept your PR because I don't see any harm it can do, but I want to remind you that the service (and by proxy the SDKs) does not officially support PnP over HTTP. Although this will get you through this specific issue, I'm worried you may run into other issues.

bjorn-jarisch commented 3 years ago

@drwill-ms thanks for the heads-up, we'll burn that bridge when we get to it :)

drwill-ms commented 3 years ago

Merging the PR closed the issue. I don't want it closed until we release an SDK version with this in it.

drwill-ms commented 3 years ago

@bjorn-jarisch do you specify the modelId when initializing the device client? We have another request in to throw an exception if someone initializes with HTTP and specifies a modelId, so the user knows up front the scenario is not supported. Would this break you?

bjorn-jarisch commented 3 years ago

@drwill-ms no, this would not break us. Our bridge does not announce a modelId; it relies on the fact that the device has already provisioned itself (and announced its modelId) previously.

drwill-ms commented 3 years ago

@bjorn-jarisch this has been released.

Release notes: https://github.com/Azure/azure-iot-sdk-csharp/releases/tag/2021-05-06 Device SDK version: 1.37.1