Support multiple saga instances to be activated for the same message

andreasohlund commented 12 years ago

Right now we don't support multiple saga instances being activated by the same message. To do this we must modify the ISagaPersister interface to allow the Get(string property..) (https://github.com/NServiceBus/NServiceBus/blob/master/src/core/NServiceBus.Saga/ISagaPersister.cs#L37) to return a IEnumerable instead.

If we find that we have a unique constraint for the given property we would use the safer and better performing way of fetching the saga mentioned here:

https://github.com/NServiceBus/NServiceBus/pull/268

In order to support this in a safe way for our users we must consider batching when the number of returned sagas is greater than X. Otherwise there would be to much things going on inside a single transaction and the risk for a TX timeout would be high. To support this we could do a bus.SendLocal() for each saga instance putting the saga id and type in the headers. This would effectively make each instance execute in its own transaction.

Note: To do this we must avoid our own conventions since it's not allow to "send" events.

lcorneliussen commented 12 years ago

This can be done easily by creating a handler that handles the events, finds corresponding sagas and then passes on single (new) messages to the saga.

I really think NSB shouldn't get to deep into that business. It is a business concern, not a technical one.

Making it more convenient to program such things would be ok though.

ramonsmits commented 11 years ago

@lcorneliussen So how would you find those saga's? Query the saga storage? Wouldn't that be wrong?

lcorneliussen commented 11 years ago

Well, that's what a saga finder would have to do anyway, well?

ramonsmits commented 11 years ago

I now have a handler that queries the saga database table for saga ids and convert the message to a saga specific command but this really feels pretty dirty. First of all for querying a set that I should do in batches or streaming and second for having this logic now coupled with the saga.

lcorneliussen commented 11 years ago

Why do you need the SagaId? Couldn't you create "business" messages that use the sagas UniqueId for correlation?

shekelator commented 11 years ago

I'm not a contributor or an expert on NSB (just a user), but I think the idea of starting multiple sagas from a single message is an unfortunate omission. It makes a lot of sense to be able to do that, and creating new messages that kick off multiple sagas is a lame hack. Is there any chance this will make it into an upcoming version?

I apologize that I'm not prepared to actually submit a pull request for this.

kblooie commented 11 years ago

I agree that there is friction here when you have a small number of sagas to update. It's really when you get to the larger number of instances that this idea has issues.

There are various problems with loading multiple sagas as part of one message. First, you wouldn't be able to reliably participate in transactions. If your one message tries to load and modify a million saga instances, then participating in a single transaction is most likely out of the question. Another thing is that you lose the ability to parallelize. If you send multiple messages, you can handle each of them in parallel across boxes. Not so with a single message.

I think that one way to deal with this friction would be to reduce the overhead of mapping one message to another. It would be nice to have an autoMapper like API that you could use to handle a message to map to another. This could add a "mapping handler" in the background. It would keep you from having to write separate handler classes just to map one message to another.

Thoughts?

danielmarbach commented 11 years ago

Why not just introduce another endpoint?

lcorneliussen commented 11 years ago

@danielmarbach I think that is exactly what @kblooie is suggesting - just that the endpoint (message handler) is created automagically.

A saga is an aggregate root (in most cases) and as such a consistency and transaction boundary. I think it is plain wrong to address multiple of them in the same tx. You can still do that with Bus.Send(object[]) - but that is a different story.

andreasohlund commented 11 years ago

Hey, I did some thinking and having a message dispatch to multiple sagas is quite trivial. I've fixed this on the multisaga branch. Can you guys take it for a spin and see what you think?

I've modified the timeoutmanager sample to show the usage:

https://github.com/NServiceBus/NServiceBus/blob/multisaga/IntegrationTests/TimeoutManager/MyServer/Saga/SimpleSaga.cs

Just hit "s" a couple of times to place orders and then hit "c" to fake a "customer made preferred" event. When that happens that same event is dispatched (as separate messages) to the different saga instances.

@udidahan what is your take on this?

udidahan commented 11 years ago

I don’t see anything requesting a timeout.

andreasohlund commented 11 years ago

The timeout was just a leftover from the old code. Is there anything in particular around timeouts that would have an effect on this?

lcorneliussen commented 11 years ago

I think it is wrong to apply a message to multiple sagas within one technical transaction. They should be independent. What about delivering them by resending them with a SagaID header?

andreasohlund commented 11 years ago

Isn't that exactly what I'm doing? One bus.sendlocal per saga instance found?

lcorneliussen commented 11 years ago

wonderful! sorry :)

andreasohlund commented 11 years ago

I think we should have a limit on how many instances we allow to avoid flooding the queue with messages if there is millions of saga instances. We can of course make this limit configurable. Thoughts?

johnsimons commented 11 years ago

Awesome :) Regarding limits, have u done any performance analysis? I would not add any limits unless really required.

I assume this will be out in v5 as it is a non backwards change?

ramonsmits commented 11 years ago

Having a limit would be strange to have here. The only problem I can think of is that the key resultset returned on the saga id finder would be to big resulting in large memory consumption. Maybe better to page through the result to minimize such issues?

johnsimons commented 11 years ago

I don't think memory is the issue here. The problem is tx timeouts, because the query it can be done within a tx. And I agree we need to do paging specially for raven.

kblooie commented 11 years ago

Why do a sendLocal? Could we just send to the master endpoint instead? That way you wouldn't need the limit.

On Jun 12, 2013, at 5:00 PM, Ramon Smits notifications@github.com wrote:

Having a limit would be strange to have here. The only problem I can think of is that the key resultset returned on the saga id finder would be to big resulting in large memory consumption. Maybe better to page through the result to minimize such issues? On Jun 12, 2013 9:52 PM, "Andreas Öhlund" notifications@github.com wrote:

I think we should have a limit on how many instances we allow to avoid flooding the queue with messages if there is millions of saga instances. We can of course make this limit configurable. Thoughts?

— Reply to this email directly or view it on GitHub< https://github.com/NServiceBus/NServiceBus/issues/298#issuecomment-19350963>

.

— Reply to this email directly or view it on GitHubhttps://github.com/NServiceBus/NServiceBus/issues/298#issuecomment-19358622 .

johnsimons commented 11 years ago

SendLocal already does that if there is a master

lcorneliussen commented 11 years ago

1) multidispatch doesn't make any sense for IAmStartedBy messages - should be caught, i guess. on the other hand, the result length will never be other than 0 or 1 in that case...

2) the first finder that returns more than one saga will lead to a multidispatch, while multiple finders returning one item each lead to multiple invocations (if i get it right :-))

why not collect all saga entities from all finders + unique...

3) for multidispatch it is not neccessary to load the full entity - the id (from unique property or index) is enough. especially if we have many, the impact on performance will be significant...

lcorneliussen commented 11 years ago

I'm still not sure if this alltogether is a desirable feature. The fact that it is possible (and work has started) is no good argument :-)

What is the real case? dispatching to a list of sagas will ALLWAYS be stale. this is a business concern and should be coded in a handler, if so. I'm pretty sure, if you have a million sagas that all need to be informed about a single message arriving, you do something seriously wrong.

johnsimons commented 11 years ago

+1 for use case for this feature?

danielmarbach commented 11 years ago

You can use the streaming api to do unbounded result sets. No need for paging

johnsimons commented 11 years ago

In raven v2?

johnsimons commented 11 years ago

We still need to do paging for nhibernate

johnsimons commented 11 years ago

The more I think about this feature the more I agree with Lars. As Lars point out the saga data is stale, so in your example telling all sagas to update themselves to set customer as preferred is pointless as by the time you are going to bill the customer the customer may not be preferred any longer! I know that is just an example but I'm still failing to see a real usage for this feature, anyone?

SimonCropp commented 11 years ago

Just my 2 cents. I asked around the team here and no one could think of a legit business case for this. Not to say one doesnt exist :)

danielmarbach commented 11 years ago

Ah that's a 2.5 feature :(

johannesg commented 11 years ago

I also agree with Lars. I really wanted this feature a while back and even tried to implement it then. But I got into the same problems as you do now regarding querying, paging and stale data and decided it wasn't worth it.

udidahan commented 11 years ago

Here’s one:

http://stackoverflow.com/questions/11567706/multiple-saga-mapping-of-a-single-message/11591423

Not that the solution couldn’t have been designed a different way, but I think there’s only so far our opinion should go. We’ve almost always had a lower-level API that allowed people to circumvent the opinions of the top level.

So, I’d have this as something that needs to be explicitly enabled in the initialization code rather than default behavior.

lcorneliussen commented 11 years ago

It is funny, how long time it takes to unlearn. Limits in NSB should continue help unlearning :) A year ago I couldn't understand how NSB could dare not to have this feature. Now I find it undesirable. I even wrote a DirectSagaAccessor that bypassed NSB and just (transactionally) called into saga-classes to be able to change state in a more performant manner. Well now, I can't imagine why I even tried; needing that, the my design was plain wrong.

@udidahan I'm sure Riccardo found a solution, or if not, will find one. Sagas today are 100% predictable - two starters: one will fail, parallel processing: etags ensure serialized processing. I think he'd also expect our solution to be processed in one transaction. Why wouldn't he just fire of one (new) message to each of the item sagas? (he says he needs excelent performance - which he won't get when we do sendlocal with tons of messages)

Letting the infrastructure come in and do a query* is messing up with business concerns that should be carefully handled. Will the set of items in the collection change? How/where is the cut? When should the items at latest get to know about that. It the collection is 'completed' - can it be reopened? what should that lead to? To many question to let the infrastructure just handle it as if it was no problem.

(*) whichs result will be stale: a) WaitForNonStaleResults() for 1000 items is a bad idea b) it will not protect the saga from getting stale between querying and message processing

On the UI we are able to Load a list of sagas by an array of unique property values - that is helpful and could be made available in NSB. But I still wouldn't use that for transactional processing...

ramonsmits commented 11 years ago

We have a valid business reason as well.

We have certain tasks that are delayed. This delay is variable depending on a certain rules. These tasks are now sagas. In certain conditions a set of these tasks (of a customer) need to be ran immediately.

Our current saga flow is

Process data
Check if it requires a delay
Schedule a timeout
When the timeout passed do follow-up

If the delay rule changes for a customer all current active saga's need to respond and take appropriate actions as in either immediately start the follow-up or reschedule. This is what the business wants.

It doesn't matter if it takes 15 minutes before the event is processed by a specific saga instance as these delays are usuallly much further in the future as in hours or days. This is logical as saga's are often long running processes.

Maybe our solution is not designed correctly but then raises the question. How to solve this if processing an event is not appropriate for a set of saga instances? To me that sounds the same as not allowing multiple messages handlers or only allowing one subscriber.

If someone has the requirement to update a tremendous amount of saga instance within a short time period to not break its SLA then they probably need to invest in a large set of worker nodes and/or alter their SLA.

chrisbednarski commented 11 years ago

@ramonsmits could you reverse the relationship? Have a saga instance per delay? One for immediate, one for delayed, etc? Then, have a list of task ids (or customer ids) in each saga instance? When a delay changed event is emitted, each saga instance (one per delay type) would update itself (add/remove the task id). Would this reduce the number of saga instances that need to be updated?

ramonsmits commented 11 years ago

Yes, it would be possible to have a saga per customer. But then this saga will contain an large set of keys (of delayed items) to signal. It is possible but such a saga would run for ever. Maybe not a big issue but we do not create saga's that run forever and use regular message handlers for such logic.

But here you have the same problem. One message would be processed by the saga that processes the item and the saga that keeps a list of item keys to signal. The difference here is ONE message needed to be handled by TWO saga's or different type. I don't know if that is supported. My guess would be yes... but never had such an implementation before within the same service.

-- Ramon

On Thu, Jun 13, 2013 at 11:32 AM, chrisbednarski notifications@github.comwrote:

@ramonsmits https://github.com/ramonsmits could you reverse the relationship? Have a saga instance per delay? One for immediate, one for delayed, etc? Then, have a list of task ids (or customer ids) in each saga instance? When a delay changed event is emitted, each saga instance (one per delay type) would update itself (add/remove the task id). Would this reduce the number of saga instances that need to be updated?

— Reply to this email directly or view it on GitHubhttps://github.com/NServiceBus/NServiceBus/issues/298#issuecomment-19381352 .

chrisbednarski commented 11 years ago

If the number of delay types is low (two or three), they could be treated as separate BCs. Each BC subscribes (separately) to the delay changed event and deals with it accordingly. Anyway, I don't think there's much choice. Either, you have thousands of sagas with small amount of data in each one (and try to update them in a single transaction) or have a single saga with lots of customer ids.

PS. I think never-ending sagas are just as good as ones that comlpete.

ramonsmits commented 11 years ago

There are just two delay types, either it is delayed or not :-) and the delay itself is calculated.

We now have a saga per delayed item. This was to implement a business requirement. The business requirement changed later on to not delay any longer at a certain business event.

If the infrastructure would support triggering an event to a bound set of saga instances implementing and realising this would have been literally minutes but unfortunately it does not so it requires a lot of extra work.

The implementation that Andreas has provided would work and behaves the same as any other method. It is like having 10.000 subscribers. When an event is published the same thing happens by accessing the subscription storage to retrieve a set of subscriber end-points whereas in the service you retrieve a set of instances.

-- Ramon

On Thu, Jun 13, 2013 at 12:28 PM, chrisbednarski notifications@github.comwrote:

If the number of delay types is low (two or three), they could be treated as separate BCs. Each BC subscribes (separately) to the delay changed event and deals with it accordingly. Anyway, I don't think there's much choice. Either, you have thousands of sagas with small amount of data in each one (and try to update them in a single transaction) or have a single saga with lots of customer ids.

PS. I think never-ending sagas are just as good as ones that compete.

— Reply to this email directly or view it on GitHubhttps://github.com/NServiceBus/NServiceBus/issues/298#issuecomment-19383681 .

ramonsmits commented 11 years ago

By the way, an additional scenario would be a notification system. A subscriber wants to receive specific notifications during certain periods. The corresponding saga instance decides if an event needs to be signalled to the subscriber. It can easily be that 'one' event be notified to a very very large set of subscribers.

Maybe this would require a different form of implementation but I think it is legit.

lcorneliussen commented 11 years ago

then we are into "semantic subscriptions" and/or content routing. i think that would be rather desirable.

andreasohlund commented 11 years ago

Even though the multisaga thing can be a sign that the design needs to change (I agree with you Lars) I think there is value to provide the feature for people transitioning to that better design. This is similar to how we provide callbacks for req/response style interactions. This of course implies that we bring this to the users attention and also provide a path (doco) for the users to design their way out of the need for multi sagas.

How about we have the feature off by default and explain to them the dangers when they enable it?

Also by making the custom sagafinders have access to the underlying persistence mechanism you should be able to control the staleness/locking etc of the resulting query to fit your business requirements? (at least close enough)

Eg if I want the CustomerMadePrefered event wake up all orders that is not in the completed state I can create my custom saga finder to do just that.

ramonsmits commented 11 years ago

Well, if the design is flawer then how should you implement such requirements? A seperate saga that aggregates all the key data? Is that really a better design?

andreasohlund commented 11 years ago

Marking this as 5.0 so that we don't forget to make the call to include this or not

SimonCropp commented 10 years ago

@andreasohlund so was the call made to "not include this in 5.0"?

andreasohlund commented 10 years ago

Correct

Sent from my iPhone

On 31 May 2014, at 14:43, Simon Cropp notifications@github.com wrote:

@andreasohlund so was the call made to "not include this in 5.0"?

— Reply to this email directly or view it on GitHub.

SimonCropp commented 9 years ago

@andreasohlund will this be easier with your meta model?

andreasohlund commented 9 years ago

Yes, but that is not the main issue here. The challenge is to implement it in a safe way for user.

An implementation is available here: https://github.com/Particular/NServiceBus/tree/multisaga

andreasohlund commented 9 years ago

We've decided to not support this since we can't provide a solution that would work for all scenarios. The workaround is to create your own lookup table and dispatch a command to each saga instance individually

Closing this one as won't fix

Particular / NServiceBus

Support multiple saga instances to be activated for the same message #298