dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.06k stars 2.03k forks source link

Document integration of non-Orleans stream producers/consumers #2784

Open sergeybykov opened 7 years ago

sergeybykov commented 7 years ago

The main use case for Orleans streaming with persistent queues, such as EventHub, Azure Queues, Kafka, SQS, is where consumers and producers are grains or Orleans clients. However, it is also possible to have non-Orleans producers and consumers integrated into Orleans streaming.

Need to document steps required for integration of non-Orleans stream producers and consumers into Orleans streaming.

jason-bragg commented 7 years ago

it is also possible to have non-Orleans producers and consumers integrated into Orleans streaming.

This is only partially true. The streaming infrastructure does not support this in a general way, but some stream providers do support this type of behavior to various degrees. To date, the only stream provider that explicitly supports this is the eventhub stream provider, and it only supports reading non-Orleans stream data. We should of course document these capabilities, but it should be understood that non-Orleans streams has not been a core requirement for most stream providers.

To support non-Orleans data using the eventhub stream provider, users will need to define a stream provider with a custom EventHubAdatperFactory that uses a custom EventHubDataAdapter to convert their data formats to cached data structures, sequence tokens, and batch containers.

We do not currently have a full example of this but a partial example is the StreamPerPartitionEventHubStreamProvider used in EHStreamPerPartitionTests. It shows how to implement a custom EventHubDataAdapter which treats all events in an eventhub partition as a single stream.

creyke commented 7 years ago

In the case of the RabbitMQ stream provider (WIP) our internal use cases cover both streaming from Orleans to Orleans, as well as consuming-only of non-Orleans streams (as per the event hub provider). I can certainly see value in producing non-Orleans streams too, although I'm sure this requirement would be rarer.

I am keen to ensure full compliance with Orleans concepts however, including how consumption of external streams is modelled and customised.

I believe the Rabbit/AMQP protocol might see more usage of external streams (especially but not exclusively on premises) over event hubs, simply as more legacy systems / SOAs use this protocol than event hubs, so I think it's important to get the abstraction right. I welcome docs on this.

jason-bragg commented 7 years ago

I think it's important to get the abstraction right.

Agreed, and this is what gives me pause.

The EventHubStreamProvider was initially developed by 343 and ported to Orleans. It is very flexible and customizable because such was necessary to allow the port to replace the 343 version and still use the application specific logic from the original provider. This is why the data format, checkpointing, monitoring, and error handling are all replaceable. While this is great for flexibility, it comes at the cost of complexity.

Taking these patterns and making them the official patterns we want to use is probably unwise. As I stated before, supporting external data has never been a requirement of the streaming system, and if we want to make it one (which I do think it should be), I advocate taking the time to design a solution to this problem and apply it uniformly across our stream providers, rather than addressing it per stream provider in the ad-hoc fashion we've thus far taken.

creyke commented 7 years ago

Please see RabbitMQDefaultMapper and IRabbitMQMapper as examples of custom business logic which may need to be injected when consuming AMQP streams from non-Orleans sources.

This is just a first try to get things up and running so I'm more than happy to work with / move towards a standard once defined.

Certain functionality in the current IRabbitMQMapper is closely tied to the nuances of partitioning non-Orleans AMQP streams while retaining FIFO (binding different routing keys to multiple queues) such as in GetPartitionKeys(), so I fear it may be difficult to create a standard interface for all stream providers, but maybe a base one can share common methods, which can then be extended where required...

xiazen commented 7 years ago

I think I see two issues in this thread.

One is documentation for integration of non-Orleans stream producers/consumers. IMO, we should add documentation for this feature as long as we support this and there's users want to use this feature. Although as @jason-bragg pointed out, only a subset of streaming provider support this pattern and this is not a main pattern Orleans streaming support currently, we should make it clear in the documentation about this point.

Another issue is the design concern in Orleans streaming. Which is should we support external data naturally in streaming system or just make the feature available for a specific set of providers. I think this is debatable. and can be discussed in another thread.

Some thought on the documentation. I think except for texts in the doc, we can get a full example of the feature in Orleans tests, which can serve as a code snippet and also test this feature.