Particular / NServiceBus.Transport.AzureServiceBus

Azure Service Bus transport
Other
22 stars 19 forks source link

Incorrect routing of messages in ServiceBus #509

Open gusmanb opened 2 years ago

gusmanb commented 2 years ago

When ServiceBus is used as transport and pub/sub is used the bundle-1 topic subscriptions are created with a SQL filter like this:

   [NServiceBus.EnclosedMessageTypes] LIKE '%(full class name)%'

This is totally incorrect, suppose the case where you have these events:

   MyNamespace.MyEvent
   MyNamespace.MyEventTwo

As the filters uses the LIKE clausule, and even worse, percentages are explicitly added to the filter, all the MyEventTwo events will be also routed to all the queues registered for MyEvent which at the end causes a "Handler not found" error as there is no handler for such messages.

The fitlers shoud NEVER use a LIKE clausule, they must use equality comparers:

   [NServiceBus.EnclosedMessageTypes] = '(full class name)'
danielmarbach commented 2 years ago

Hi @gusmanb

Thanks for reaching out to us and raising your concerns. I took the liberty of moving it to the correct repository. I can relate to your skepticism when you look at the current filter rule from a technical and puristic standpoint. You are indeed correct when defining event that have the same "base name" they will get routed due to the LIKE condition and the capturing % at the end.

This is though a feature of the topology that we explicitly designed for. The reason being that NServiceBus as a higher level framework, and as such it is trying to push you into the direction of declaring meaningful business events. MyEvent and MyEventTwo are rarely ever meaningful business events. When following the guidelines you would end up with event names like Shipping.OrderShipped or Sales.OrderAccepted which by the virtue of the naming doesn't make it likely to clash.

Let's take the above example to the next level. Assuming we have a situation like Sales.OrderAccepted and Sales.OrderAcceptedTwo (to follow your example). Sales.OrderAcceptedTwo would then fall under the same "baseName" of Sales.OrderAccepted and considered by NServiceBus as a version of the Sales.OrderAccepted event. Therefore, routing it to the subscriber of Sales.OrderAccepted enables evolving the contract without falling into the problem of assembly versioning.

Useful guidelines

danielmarbach commented 2 years ago

@gusmanb let me know if the current behavior of the Azure Service Bus transport poses a problem for one of your endpoints, and we might be able to find a suitable workaround for that specific endpoint. From a behavior perspective though in general this works as intended as I tried to highlight above, and I will be closing this issue as soon as I have received your feedback.

gusmanb commented 2 years ago

Hi.

Yes, I agree that the example is not meaningful, in our real case we have "NativeMessages.ExternalCall" and "NativeMessages.ExternalCallResult" which are meaningful and caused the results to be routed to the services that receive the call. Of course that is very easy to solve, changing "NativeMessages.ExternalCall" to "NativeMessages.ExternalCallRequest" (what we already have done) solves the problem.

But, said that, if this is intended by design to force users to follow best practices, should not NServiceBus throw an error when it finds this situation at least when autosubscription is enabled? It could check the subscriptions in the "bundle" topic and if it is going to register a message that would collide with an already registered one then throw the error.

The main problem I see is that for someone that doesn't known this "feature" it's a lot of time wasted until you find what is happening, without an error message you are clueless and to be honest I can't find anywhere in the documentation that states that messages will be matched by partial class names, so what you expect is exact matches, not partial ones.

Also, if you consider the current status you have two pieces of the same system that don't follow the same policy, routing in ServiceBus matches partial class names, but NServiceBus itself matches handlers by exact class names, what as I see it is not good practice.

Cheers.

danielmarbach commented 2 years ago

@gusmanb

I agree that the example is not meaningful

Apologies, it wasn't my intention to play down your example. I truly understood it was a generic one for the discussion, and as such an excellent example to get started with.

Based on your feedback I have raised a documentation PR to make it visible on the topology documentation how things are behaving at the moment. I have also added there more details about why that also includes the stronger reason why we introduced it which is the polymorphic behavior (sorry it only occurred to me while I started writing the documentation update).

Regarding your concrete example. Thanks for that! When we talk to customers in solution architecture discussions, a scenario like you described is usually advised to be treated as a request/reply scenario and not a pub/sub scenario. The reason being it fits closer to the definition of

This would make the NativeMessages.ExternalCall a command (ICommand) and NativeMessages.ExternalCallResult a result (IMessage).

image

https://docs.particular.net/nservicebus/messaging/messages-events-commands

SeanFeldman commented 2 years ago

Given the message type header contents, how would you suggest to construct the filter, @gusmanb?

SeanFeldman commented 2 years ago

To add more details, how would you propose to handle SQL filters (correlation filters won't work, tested already) given that a message could be in a chain of inheritance and polymorphism has to be respected?

Here's an example of what an event type header (NServiceBus.EnclosedMessageTypes) value could look like:

Contracts.Events.Api.FileChanged, Contracts, Version=1.0.0.0, Culture=neutral, 
PublicKeyToken=null;Contracts.Events.Api.FileSyncEvent, Contracts, Version=1.0.0.0, Culture=neutral, 
PublicKeyToken=null;Contracts.Events.Api.SyncEvent, Contracts, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
gusmanb commented 2 years ago

Apologies, it wasn't my intention to play down your example.

Nothing to apologize for :)

Here's an example of what an event type header...

Ok, now I start to understand your problem, I had not checked the header contents but it starts making sense why you did the filter like these, to allow the register of a handler for a base class that would receive events of any child type.

Let's break the problem in two steps, delimiting the end and delimiting the start of the class name.

Delimiting the end of the class name should be easy, if the filter instead of '%(className)%' becomes '%(className),%' will solve this step, as the header includes the class assembly qualified name we can rely on that final comma at the end of the name to do the delimitation.

To delimit the start of the class name I see the header contains the class assembly qualified name and if I undersstood it correctlly its ancestors assembly qualified names joined by a semicolon with no spaces, if that's the case then we can rely in the beginning of the string or the semicolon using an OR: ... LIKE '(className),%' ... OR LIKE '%;(className),%'.

Basically a complete filter for a class named MyNamespace.MyEvent would look like this:

[NServiceBus.EnclosedMessageTypes] LIKE 'MyNamespace.MyEvent,%' OR [NServiceBus.EnclosedMessageTypes] LIKE '%;MyNamespace.MyEvent,%'

I'm out of the office so I can't test this until next week but I think that would solve all the problems at once, I will try to get some time on Monday to test it and check if it would work.

Cheers.

SeanFeldman commented 2 years ago

You're correct about the two patterns and the ability to OR between those. There are two concerns there:

  1. Performance
  2. SQL Filter constraints

Performance

Let's have a look at the economics of this change. An event published to the topic (bundle-1) would be evaluated against each subscription (a subscription per each endpoint). The message would be assessed against each SQL filter for each subscription until a satisfied filter is found. Each filter evaluation is computed based on, abstracted by the service. A simple filter such as X LIKE Y is not as performant as X = Y but better than X LIKE Y OR X LIKE Z. On a system with a few endpoints and events, it's not a big deal. With a significantly larger system, with a substantial number of events, this can cause a namespace to crawl.

SQL Filter constraints

This one seems less of a concern in most scenarios but still needs to be considered. The maximum length of filter condition string per subscription/endpoint: 1,024 characters in total. Not a common scenario, but never say never.

danielmarbach commented 2 years ago

@gusmanb FYI the documentation PR also shows some header examples.

gusmanb commented 2 years ago

@gusmanb FYI the https://github.com/Particular/docs.particular.net/pull/5660 also shows some header examples.

That looks really good, with that documentation in place to avoid a situation like that is just a matter of reading the docs :)

@SeanFeldman Yes, of course it would use more cpu resources, but I'm not totally sure how many resources would be and if it would really impact the namespace workload, ASB limits resources to your namespace based on credits, and filters take 10 credits no matter its complexity, so I woud assume that Microsoft has taken into account that complex filters could be created, but this is only a suposition.

Anyway, in any case what could at least be included in the filter is the final comma, that would reduce the posibility of missmatch greatly as it would discard extended names with no performance impact, is only a singe character more for the LIKE comparison.

Cheers.

SeanFeldman commented 2 years ago

Anyway, in any case what could at least be included in the filter is the final comma, that would reduce the posibility of missmatch greatly as it would discard extended names with no performance impact, is only a singe character more for the LIKE comparison.

That's a good suggestion. Needs to be validated against native integrationnative integration scenario to ensure backwards compatibility.

danielmarbach commented 2 years ago

Including the final comma would break consumer driven contracts. It is also possible that the list of type names only contains one type and then there will be no semicolon which makes matching on the semicolon tricky too.

SeanFeldman commented 2 years ago

And there we go, CDC.

AFAIR, we've looked into options in the past, and did simplify the filter, but couldn't get it to anything simpler than what it is today.

Another option @gusmanb would be taking control over subscription provisioning and create your own filters.