eclipse-hono / hono

Eclipse Hono™ Project

https://eclipse.dev/hono

Eclipse Public License 2.0

453 stars 137 forks source link

Eclipse Hono on Microsoft Azure - AMQP Messaging Network proposal #1154

Open erryB opened 5 years ago

erryB commented 5 years ago

AMQP Messaging Network proposal

The initial idea mentioned in the Issue 1120 was to leverage Azure Event Hubs, but in order to handle bidirectional communication and to avoid Partition Key complexity we believe that Azure IoT Hub is probably the best option.

Following Hono approach, we split the concept in two separate scenarios: Device to Cloud - Telemetry and Events and Cloud to Device - Command and Control.

In both cases, we want to underline that the proposed diagrams do not include any diagnostics/logging element at the moment. This is because the main goal of this post is to have a conversation about the main architecture, logging will be of course taken into account later during the implementation.

Any feedback about this architecture would be much appreciated, it would be very useful for us to understand if there is any concern we missed or any other interesting ideas to move forward.

Thanks!

Erica

Device to Cloud: Telemetry and Events

Here you can see a draft of the architecture for D2C data, we highlited in blue the components we believe should be added to have the best experience with Azure.

telemetry

Data coming from the Device goes through Protocol Adapters and reaches Azure IoT Hub. Here an important point we need to take into account is related to the IoT Hub Device ID, which should be composed by an aggregation of Tenant ID and Hono Device ID, in order to be unique and to allow the messages to be addressed properly.

IoT Hub uses partitiones behind the scenes. The number of partitions have a direct impact on logical ordering, number of readers and of course on the overall performances. It's very important to remember that this number can be selected only at creation time, and it's not possibile to change it afterwards. Using IoT Hub allows to avoid the definition of the partition key and the construction of our IoT Hub Device ID guarantees that all the messages for a specific device will be in the same partition and will be processed in logial order.

The messages are then available on the Cloud in different IoT Hub Consumer Groups. In order to handle them properly and forward them to the correct LOB Application Consumer, we need to introduce a component here called Event Processor. The current idea in terms of implementation is to use Kubernetes StatefulSets and we need to guarantee that there is only 1 instance running to process a specific partition.

The main goals of this component are the following:

Identify the correct Endpoint that should receive the message. This can be done using the Tenant ID to discriminate.
Handle checkpoints. This is an important point because we need to persist the sequence number in order to know which messages have been already consumed by this component.
Have different behaviors depending on the connectivity status of the LOB Application Consumer:
- Online scenario: the message can be forwarded directly to the AMQP 1.0 Hono Endpoint and eventually to the LOB Application.
- Offline scenario: if required according to the QoS level, we need to queue the messages in order to be delivered later. To do so, our idea is to add an Azure Service Bus Queue to store undelivered messages, that will be forwarded to the proper LOB Application as soon as it will become available. This is necessary for all Event messages. One important thing we need to underline here is that the throughput supported by Azure Service Bus Queue is lower than Azure IoT Hub. Specific values depend on the implementation, but write rates have different orders of magnitude, so we might need to handle the mismatch, and consequently increase the complexity of the proposed solution. This is still an open point in this architecture.

Cloud to Device - Command and Control

As you can see in the image below, the architecture is simpler because we do not need to provide offline support.

C2D

In this case the only component we need to leverage is Azure IoT Hub. Idea is to use Direct Methods, identifying the proper device using the IoT Hub Device ID, which is an aggregation of Tenant ID and Hono Device ID, as mentioned before. Here it's very important to define the payload of the messages to be sent to the devices, in order to unerstand how Direct Methods can be actually leveraged. For instance, it would be interesting to use some properties for diagnostic purposes.

Please let us know your thoughts and feedback

sophokles73 commented 5 years ago

Hi @erryB,

thanks for the detailed overview of the proposed architecture. Reading through it, several questions occurred to me. I will focus on the telemetry direction for now.

My understanding is that a single IoT Hub instance will be used for handling multiple Hono tenants, right? If so, is this for reasons of better resource efficiency or would it also make sense to use distinct IoT Hub instances per tenant?
The resources exposed by IoT Hub via AMQP 1.0 use specific addressing schemes which represent their semantics but also contain more technical aspects like partition and key. If so, I assume we will need to introduce an adaptation layer (either in Hono's protocol adapters or in front of IoT Hub's AMQP endpoints) which take care of translating between Hono's resource address scheme and that of IoT Hub. This adaptation layer would then also be responsible foraggregating Hono's tenant and device ID into the IoT Hub Device ID, right?
I am not sure if I understand the role/purpose of the Event Processor correctly. My guess is, that it implements Hono's northbound Telemetry and Event APIs which consuming applications connect to in order to receive telemetry and events. Based on this assumption, the Event Processor would also need to enforce the access policy, i.e. making sure that consumers can read only messages produced by devices that belong to the tenant that they are authorized for. If all tenants share the partitions available in the IoT Hub instance (by means of using tenant ID + device ID as the partition key), I wonder how the Event Processor can filter the messages based on the consumer's authorities.
In the diagram the AMQP 1.0 Hono endpoint component is drawn as if it were an existing Hono component. This is not the case (yet). In our existing setup with e.g. enMasse as the AMQP Messaging Network, consumers connect directly to enMasse's AMQP endpoint. When using IoT Hub, where would consumers connect to?

WDYT?

erryB commented 5 years ago

HI @sophokles73

Thanks for your feedback, I think it leads to a very interesting discussion. Here I try to give a quick answer to your points.

We actually thought about having multiple IoT Hubs, one foreach tenant, because we see there are some pros on that, for instance it would be easier to control the performances of each Tenant, having a clear view of message rate and throughput. The main reason why we started using a single instance of IoT Hub was related to the number of tenants: if that number becomes really big, than the management could be complicated. But if this is not the case, we can go for one IoT Hub per tenant.
Your assumption is correct: the protocol adapters would most likely need to be modified, in order to have an interface that outputs to the IoT Hub’s expected paths, identities etc. This could be also handled through another adapter, which would be the easy way in terms of implementation, but would require an additional component to be added to the architecture. Here the feedback from the community would be really useful.
If we go for one IoT Hub per Tenant, then we don’t need the Event Processor to perform the filtering action you mentioned.
If consumers connect directly with the custom endpoint and, of course, they cannot be modified to communicate directly with the backend, then I agree we need to create also an AMQP Endpoint component for protocol and path translation.

sophokles73 commented 5 years ago

The main reason why we started using a single instance of IoT Hub was related to the number of tenants: if that number becomes really big, than the management could be complicated. But if this is not the case, we can go for one IoT Hub per tenant.

I think for cost efficiency, it would still be desirable to be able to use a single instance for multiple tenants, e.g. for offering free plans.

This could be also handled through another adapter, which would be the easy way in terms of implementation, but would require an additional component to be added to the architecture.

FMPOV we would handle this in the HonoClient component which is managing the connection to the AMQP Messaging Network and which is used by the adapters to forward messages downstream. I can imagine introducing a configuration property there which would allow us to select Azure Addressing Scheme instead of Hono Addressing Scheme for downstream messages.

If we go for one IoT Hub per Tenant, then we don’t need the Event Processor to perform the filtering action you mentioned

Obviously. however, if we do want to be able to share a single IoT Hub instance among multiple tenants then this would be the responsibility of the Event Processor component, right?

If consumers connect directly with the custom endpoint and, of course, they cannot be modified to communicate directly with the backend, then I agree we need to create also an AMQP Endpoint component for protocol and path translation.

I am not sure that I understand which endpoint you are referring to with the custom endpoint. is this the AMQP endpoint provided by IoT Hub?

erryB commented 5 years ago

Thanks again for your interesting input and feedback @sophokles73

We need to take into account that a single instance of IoT Hub shared for multiple tenants would be good for extensibility and horizontal scale, while having multiple IoT Hubs, one per tenant, would be good in terms of vertical scale but would also increase the complexity of the solution. We can consider it as an iterative approach, thus our proposal is to focus on the first iteration, demonstrating how to connect Eclipse Hono to a single IoT Hub supporting multi-tenant, so that we have a simple solution available in short terms and might explore scalability issues afterwards.

In order to connect the device to IoT Hub we need to have a pluggable interface able to recognize what is the cloud service the device wants to connect to. To implement this, we should work on 2 main points: first of all, we need to create this interface and use dependency injection to get the correct information necessary to connect, and then we have to implement the concrete connection to IoT Hub. It makes sense to leverage the component you already use as interface, as you mentioned.

Regarding the Event Processor component and assuming we will implement the solution with a single instance of IoT Hub shared for multiple tenants, we need Event Processor to be able to filter the messages based on Tenant ID. Basically the Event Processor will be responsible for consuming messages from IoT Hub and pushing these messages towards the appropriate LOB Application. Since we need EP to push messages but we also want to keep compatibility with the standard behavior of LOB App, we also need to add a small component/extension between EP and LOB Application, in order to guarantee the same behavior as enMasse's AMQP Endpoint.

What do you think?

sophokles73 commented 5 years ago

We can consider it as an iterative approach, thus our proposal is to focus on the first iteration, demonstrating how to connect Eclipse Hono to a single IoT Hub supporting multi-tenant, so that we have a simple solution available in short terms and might explore scalability issues afterwards.

Sounds good to me.

first of all, we need to create this interface and use dependency injection to get the correct information necessary to connect

We already use dependency injection for setting up the protocol adapters. We can use an additional configuration property to use the Azure specific client for using Azure IoT Hub as the AMQP 1.0 Messaging Network.

Basically the Event Processor will be responsible for consuming messages from IoT Hub and pushing these messages towards the appropriate LOB Application.

Understood.

Since we need EP to push messages but we also want to keep compatibility with the standard behavior of LOB App, we also need to add a small component/extension between EP and LOB Application, in order to guarantee the same behavior as enMasse's AMQP Endpoint.

Indeed, the LOB Applications connect to the AMQP Messaging Endpoint in order to consume messages, so either the EP itself (or the small component between EP and LOB app) will need to expose an AMQP 1.0 endpoint to which the apps can connect. However, I wonder why we would want to add an extra component there and not just let the EP expose that endpoint.

I do have a more generic question as well: is there a particular (compelling) reason why we should use Azure IoT Hub instead of the generic Event Hub? Based on the documentation I found online, it doesn't look like there's is much difference when simply using them for downstream message forwarding, is there?

mhemmeter commented 5 years ago

From a service provider perspective - offering Eclipse Hono as a managed cloud service - this proposal leads to two non-technical questions:

Eclipse Hono requires an AMQP 1.0 Messaging Network for exposing its remote service interfaces to business applications - not more and not less. Using Azure IoT Hub as such a messaging network feels somehow oversized due to the additional features it provides. Some are going in the direction of other Eclipse IoT projects (e.g. Device Twins and Eclipse Ditto). The open question for me is if there are any other features apart form the messaging aspect Eclipse Hono could benefit from (given the scope this project has)? If there are no other features I would like to better understand why the existing messaging services cannot be used, so I have the same question like @sophokles73.
Second topic is pricing: Azure Event Hubs Dedicated has a fixed entry price of ~ $5,000 and seams to have no restrictions regarding number of messages you can process (see https://azure.microsoft.com/en-us/pricing/details/event-hubs/). With Azure IoT Hub Standard Tier you get two S3 editions for the same price limiting you to 600,000,000 messages of 4 KB (see https://azure.microsoft.com/en-us/pricing/details/iot-hub/). For more you have to pay more. So looking at the price the first question becomes even more important as Azure Event Hubs seems to be the cheaper choice. Hint: Pricing is complex nowadays in the cloud so probably there is a need to look into some concrete scenarios.

I'm looking forward to your feedback regarding these questions.

erryB commented 5 years ago

@sophokles73 you are correct, Event Processor component could directly expose the AMQP 1.0 endpoint, but in that case it would be necessary to change the endpoint for each different platform (e.g. not all platforms will have or require an Event Processor Host). Having two different components allows us to isolate the endpoint as an extensible point and keep the Event Processor component simple, focused and efficient. These are the reasons behind our proposal, however we can still decide to develop a single component very tightened to Azure platform.

@sophokles73 @mhemmeter Regarding the differences between the IoT Hub and the Event Hub, I'll try to answer you both. Of course, there are many differences between the two of them, some more compelling than others in the scenario we are looking at here. The reason why we decided to leverage the IoT Hub was tied to the support for Direct Methods and Commands for the outbound connectivity. The other part that is beneficial is the device level authentication, but as mentioned there could be an Event Hub used here and in fact it was our first thought. Nevertheless command path was more appealing, without it there is a need to introduce other persistence points such as queues (e.g. Service Bus). Introduction of extra components as you point out means that there are considerations that have to be made, such as overall availability: the more components the more risk of a single component failing and reducing your availability expectations. The other element here is that we must consider all of the other factors around service limits and quotas to understand them in their entirety.

@mhemmeter The discussion about pricing can be complicated. As you point out, there are all different limits and pricing implications for each and every choice made. In the end with the architecture either Event Hubs or IoT Hub will require an Event Processor host to consume messages. The decision to use one or the other ties to the requirement to manage and host scalability in Hono or offload to Azure on the command side. When we think about a platform operator vs. a service provider vs. a solution builder they will all have different preference as it comes to pricing. Part of this gets into the operational vs. development vs. support costs. I agree with you that as a service provider the dedicated Event Hub option may be more attractive, but I guess we also need to take into account smaller solutions where it’s not possible to operate at 5K cost for a single part of the architecture.

mhemmeter commented 5 years ago

Hi @erryB, thanks for your feedback. Let me first focus on the differences between IoT Hub and Event Hubs (second paragraph above).

I understand that the main driver for the proposal of using IoT Hub is the command path (outbound connectivity) and I fully agree to your argument that we should try to limit the number of dependencies to a minimum. However my understanding is that we need in this proposal for device to cloud communication IoT Hub and on top Service Bus for processing Events ("offline fallback"). So just looking at the dependencies we are talking about "Event Hubs plus Service Bus" versus "IoT Hub plus Service Bus", don't we? If this is true we anyway have Service Bus as a dependency and we could potentially use it also for command path.

Your second point of using device level authentication provided by IoT Hub is probably an interesting benefit. However this means that we need an implementation of the Credentials API in the system that makes use of this IoT Hub feature (see https://www.eclipse.org/hono/api/credentials-api/). Also we need an implementation of the mandatory operation of the Device Registry API (https://www.eclipse.org/hono/api/device-registration-api/, "Assert Device Registration") if we make active use of the IoT Hub identity registry. As you can see having the discussion in that direction leads to other questions compared to focusing on the messaging aspect we started with.

mhemmeter commented 5 years ago

Coming back to pricing today: The good thing with Event Hubs is that not only the dedicated offering is available but also also the basic and standard tiers. So if you are running Eclipse Hono using Event Hubs as the messaging network in a cost sensitive context I think you can do so by choosing basic or standard tier.

erryB commented 5 years ago

Thanks @mhemmeter You are correct, we do need a Service Bus Queue for the offline fallback. However, if we decide not to leverage IoT Hub, we need to add not only another Service Bus Queue, but also a connection bounding mechanism, because EH has a maximum of 5K connections that can be established. Besides that, we need to handle Cloud to Device messages because we couldn’t use Direct Methods to do so, even though the concepts aligns pretty close to Hono requirements. Security and Device Identity are other elements we need to take into account. I also agree with you that the Protocol Adapters need some changes in order to create the IoT Hub Device ID, but some changes would be also necessary to handle Event Hubs partition key. Regarding the price, also IoT Hub has different tiers to choose, according to your needs, here you can take a look at the official documentation. As I mentioned though, we actually considered Event Hubs and of course it’s still possible to choose it. I'm just trying to consider all the features and not just costs, I believe we should take into account complexity and dependencies to select the best option.

sophokles73 commented 5 years ago

Hi @erryB,

going through the documentation of the Event Hub, I wonder, if it is possible to use multiple topics with the Event Hub as you can with Apache Kafka. We could then use two topics per tenant (one for telemetry and one for events) in order to implement the downstream direction for multiple tenants using a single Event Hub instance.

WDYT?

erryB commented 5 years ago

Hi @sophokles73

The comparison of Kafka to Event Hubs features is described in this page. Basically Event Hubs provides an endpoint which can be used by Kafka applications. In the overall evaluation, though, we also need to take into account some limitations which can become relevant in multi-tenant scenarios, like 100 Even Hubs Namespaces per subscription. If we want to focus on Event Hubs, I think that, in terms of optimization, a single instance of EH per tenant with different partitions for Telemetry and Events could be a better option. However, we also need to keep in mind that with Event Hubs we need a custom implementation of Event Processor component and a separate mechanism to implement Command & Control messages.

sophokles73 commented 5 years ago

Ok, I see. So each Event Hub instance within a Namespace corresponds to a Topic within a Kafka cluster. My understanding is that I can create multiple (how many?) Event Hubs per namespace and, as you indicated, a limited number of Namespaces per subscription. For the moment, I just want to make sure, that I understand all options that are on the table ...

erryB commented 5 years ago

Yes that's correct. You can create up to 10 Event Hubs per EH Namespace and up to 100 EH Namespaces per subscription. Here you can find all the details and other limits.

sophokles73 commented 5 years ago

@erryB thanks for the info, Erica. This is very helpful for considering which approach could/will work for us.