cloudevents / spec

CloudEvents Specification
https://cloudevents.io
Apache License 2.0
4.93k stars 582 forks source link

CloudEvents Subscription Use Case #767

Open erikerikson opened 3 years ago

erikerikson commented 3 years ago

Responding to Doug's request for a subscription related use case... For context the broad space of the use case is IoT.

The model that I have have been promoting is a sort of ring on which different domain specialists sit, interacting with the external world (from a system perspective) and with one another. Each of these is actually business logic relating to the domain, packaged in microservices, also containing identity and proof assets, archives of event streams, and materializations of event streams.

Each domain specialist interacts with the sensors and actuators of their domain (e.g. device or users) as sources of event descriptions of observations/measurements/syntheses and commands for actuation (whether display specifications or motor target changes). Each domain specialist instance contains all the security configuration, rights, and subscription details over the set of associated domain entities and acts as gatekeeper to accessing those entities and as publisher of information produced by those devices and more specifically the sensors onboard them.

The next layer of consideration is then those interested in the products of one or more domains which have their subscriptions and control streams over domain entities. Particularly in a device management use case, there is a user domain specialist that presents device (and other) information to users that manage the devices to ensure their proper configuration and continuous proper function. Users view specific devices and summaries of device observations as well as ongoing updates (e.g. alarm activations or clearances, live metric streams, et cetera).

The domain interaction has some interesting cases but is relatively straightforward in terms of entity observations and command issuance but the transmission between ring participants is where the most interesting considerations come up. The device domain specialist (DDS) will have all device information available to it, including some that it may not even process (but still will archive for future unknown but probable business processes and knowledge requirements [we are a resource constrained startup with expectations of increase resource availability as we scale]). A user domain specialist (UDS) will want a subset of all that information (available to the DDS for a subset of devices [e.g. current configuration and telemetry]) for presentation and notification purposes.

Looking closer, specific users will want a subset of that subset of all device information. The most specific subset of devices and specific data is dynamic based on user activity. For maximal efficiency the UDS would continuously push the complete specific information filtering configuration upstream to the DDS (and perhaps devices) in real time but this is probably unwise given the continuous updating of that interest based on the user's direction of their attention as they click through pages. As is my want, I use event order, stream joins, and log processing to achieve consistency in the system and I care that guarantees of these properties are maintained throughout the system. To summarize it, the process of data flowing from devices to users and from users to devices passes through authorization layers and filters but for all authorized data there are multiple levels at which the data is coalesced, filtered, and delivered based on limits of local capacity to efficiently serve dynamicity within known and changing scopes of subscription.

Do we want subscriptions to the Cloud Events Subscription API updated on devices as a user navigates through pages and even as they interact with adjusting a specific device graph? I'd guess not (certainly Azure currently makes this a deploy-time decision) but I think that is one extreme end of the spectrum at play here. The other end of the spectrum is the "degenerate" case described on a call as a behavior of "enterprise developers" who request all data be delivered (until challenged). In this case, all device information, for all devices, would be subscribed to and locally filtered (on the browser if we want to be entirely ridiculous) to just that data which is desired or actionable. Of course, instead of either of these ends of the spectrum, I would expect that the UDS will identify the device data that it supports users in interacting with. It will likely use shared materializations of that data (historically indexed by affecting offset [with TTL]) and subscribe to updates over the data for all devices. It will then manage distributing those updates according to local dynamic server-side filtering based on what users individually attenuate to and additionally in browser based on appropriate knowledge given the current presentation state. Therefore a first filtering would occur on data distribution from the DDS and then a second filtering and multiplexing would happen on the UDS side of the ring and even a third in the browser.

This is a simple statement of a simplest case of direct information flow and management in a domain specialist ring based design. In more advanced cases we get into the tracking of transitive data flow chains where the processing of one or more (i.e. joined) ordered event streams produces a new event stream that can be the subject of subscription/interest (e.g. the synthesis of alarms from telemetry data contingent on device configuration and status; those alarms displayed and updated in the presentations of one or more users). This can be vital to scenarios of effect after cause or its single hop version read after write.

duglin commented 3 years ago

Therefore a first filtering would occur on data distribution from the DDS and then a second filtering and multiplexing would happen on the UDS side of the ring and even a third in the browser.

The first two steps/layers would each turn into some kind of subscription, right? If so, would it be ok if it was modeled as independent subscriptions or are you looking for a way to have a single subscription encapsulate these multiple layers?

erikerikson commented 3 years ago

Independent subscriptions is my expectation. I think trying to mix them into a single subscription would probably be a mess.

To be clear, for device data flowing to the UI, there's at least (but potentially more) the device to DDS, DDS to UDS, UDS to UI, each of which I would expect to be independently specified as subscriptions. In that statement i was focusing on the filters and subscriber side.

clemensv commented 3 years ago

OPC UA Federation

In OPC UA and in DIN 92222, we've been (and are still, albeit slowly and intermittently) discussing such federation scenarios where you have a firehose of data coming out a production system and then you need to aggregate and filter that down to a relative trickle for various audiences.

From a CloudEvents perspective, I think all those different stages of filtering and aggregation are disjoint relationships. The component manufacturer is only going to get what the machine manufacturer gives them and is not going have (technical) leverage over whatever comes out of the factory.

clemensv commented 3 years ago

@deissnerk and myself are having somewhat related discussions about the integrations between our respective systems.

On Klaus' side, there are very large business apps where a tenant might handle many millions of individual business relationships and those apps can potentially fire very large torrents of events. We will want for those apps to be event-driven-extensible with apps that run in our serverless environments, but to enable this flow, we obviously don't want every event that can potentially be fired to indeed be raised and delivered to Azure's doorstep only to be discarded in the eventing system for a lack of subscribers and thus dropping billions of events on the floor every day.

That means we need some way to turn the "event pipe" on or off.

Propagating each subscription upstream from the ultimate subscriber to near the original source is an option, but causes a lot of traffic and management interdependencies. Even with optimizations, that is still fairly complicated.

What is needed is a balance between keeping the inter-application mechanics manageable and not causing excessive flooding over the channels with unwanted events. It might quite well be, that this not a mechanism that we define here in CloudEvents, but a knob that is being turned on the application side.

For instance, you might enable event flow for a particular module and for a particular tenant for a specific event type in the source application and then choose to pipe those over to the environment where extensions will be hosted. Whether there are or will be subscribers for those events might just be an out-of-band negotiation.

erikerikson commented 3 years ago

I think these are great use cases in contrast with the use case of not wanting to propagate subscriptions from a user browser to a remote and intermittently connected IoT device. It seems likely that the transitivity of subscriptions is something that the specification should probably avoid expressing any opinion about but its not clear to me what demons could lie therein. Still, if all the events I subscribe to show up in order (accepting that some orders are poorly defined) in a reasonably timely manner, do I care? Perhaps stated differently the policy here must be sensitive to the incentives at play. For intermediaries such as yourself, a means of cost sharing seems appropriate to align incentives. A crispness about whose right and responsibility such decisions are seems significant.

I would expect your (Clemens) systems to, at minimum, subscribe to a union of all the subscriptions requested of you. In some cases it may solve pain points and optimize multi-dimensional costs to transparently over-deliver. In other cases complete delivery may be a portion of system requirements (e.g. digital twins) while in others it might simply be more efficient (e.g. when there is duplication across private subscriptions on a public data stream) or it may be more efficient (e.g. where latency and/or the volume of subscription updates impact delivery responsiveness - like attempting to subscribe browsers to intermittently connected device event streams).

Regarding Klaus' systems I would expect that the original sensors are likely to report centrally where a distribution clearing house can archive all original observations and propagate those for which there is some valid subscriber. That subscription point might be the point of subscription for your own systems.

Far more interesting to me personally are the information theoretic guarantees I get (or not) from the sources and every intermediary in the delivery chain. This is why I tend to favor Kafka, EventHub, Kinesis, et cetera and avoid EventGrid, EventBridge, DataFlow (the newish ordered mode may invalidate this), service buses, et cetera. In one sense this violates the anonymity of eventing while at the same time respecting it. In some ways it feels similar to not talking about someone else's behavior but talking instead about my needs and experiences (the mileage on that metaphor is sure to vary wildly).

Relevant factors (off the top of my head):

  1. local resource constraints (how are workloads aligned to relative availability adjusted costs?)
  2. subscription propagation time (how long until the source system knows it should produce?)
  3. subscription activation time (does it require manual approval?)
  4. event delivery delays (how long after production do events arrive?)
  5. subscription C~R~UD frequency and/or size (how frequent are client changes?)
  6. value of event timeliness (how much does missing events during a propagation/activation/delivery window cost?)
  7. event confidentiality (what regulatory regimes is the data subject to?)
  8. event delivery guarantees (must order be guaranteed? how must order across partition keys or sources be established?)
  9. event delivery costs (how many events of what sizes and what costs to process and/or propagate?)

I'm sure there are more that tired-at-the-end-of-the-week me is missing. We could likely assemble these factors into a formula to weigh the costs on one side against those on the other, identify prominent scenario clusters, and distill this all in to guidance but it seems impossible to declare a single answer and not clear to me that we could even present an array of answers well.

The following seems random but important. During calls I have observed what seems like people speaking as though subscriptions through intermediaries are a chain of custody with events passing one to the next and so on. I have also observed what seems like people speaking as though subscriptions are a little more like matchmakers configuring sources for dropping events directly onto sinks. I can imagine that it is possible to move subsets of available transformation algorithms and potentially related materializations to sources and sinks such that hybrid chain of custody plus matchmaking are the composite result. I can imagine each of these scenarios has relevant use cases. Consider for the matchmaking case the transmission of high confidentiality data. On the other end a series of a sufficiently trusted set of event processors where each is enriching the output of the former such that the final delivered event has more value. Although we clearly allowed for custody chains in our discussions, its not clear that everyone is consistent in how we are thinking of this in our discussions on subscriptions.

duglin commented 3 years ago

On the 6/17 call, @erikerikson said he'd look into create a PR

erikerikson commented 3 years ago

@duglin you probably meant to comment this on # 699 - I'm not sure what you want to do with this one.