Particular / NServiceBus

Build, version, and monitor better microservices with the most powerful service platform for .NET
https://particular.net/nservicebus/
Other
2.1k stars 647 forks source link

Subscribing to a base message type when the assembly versions don't match results in an unexpected message instance of the base type instead of the type that was published #6779

Open oskar opened 1 year ago

oskar commented 1 year ago

Describe the bug

Description

We have a system with many applications with their own messages contract packages (published as nuget packages) which publishes events that other applications subscribe to (consuming the publishing application's contract nuget package to be able to deserialize the messages).

Each time a new version of an application is released (multiple times per day), the assembly version is bumped by the CI pipeline. But only if the contracts are actually changed, we publish a new contract nuget package. All subscribing applications always try to consume the latest available contract nuget packages from the publishers, to avoid message deserialization problems. But in practice, they almost never consume the exact same assembly version for two reasons:

  1. We don't create a new nuget package unless the contracts have changed
  2. The publishing application gets a new version for each deploy even if only internal things have changed

Secondly, a common scenario is that we have a message hierarchy that can look like this:

interface IItemChanged : IEvent { }
interface IPriceChanged : IItemChanged { }
interface ICategoryChanged : IItemChanged { }

The publishing application publishes specific events like IPriceChanged and ICategoryChanged and the consuming application might subscribe to all item changes using IHandleMessages<IItemChanged>

Thirdly, sometimes the consuming application downcasts the messages to the specific types:

var price = message switch
{
    IPriceChanged m => m.NewPrice,
    ICategoryChanged m => m.CategoryBasePrice,
    _ => 0
});

In NServiceBus 7, this scenario has always worked as expected. The subscriber receives an instance of type IPriceChanged__impl or ICategoryChanged__impl to the Handle method so the downcasting works. After upgrading to NServiceBus 8, this behaviour has changed. The Handle method now receives an instance of IItemChanged__impl - making the downcast fail.

Right now we have identified two overlapping workarounds:

  1. Subscribe to "concrete" event types instead of base types (i.e. subscribe to IPriceChanged and ICategoryChanged instead of IItemChanged)
  2. Avoid downcasting if subscribing to a base type (which in practice might mean the same as the first workaround)

We consider this changed behavior a bug in NServiceBus 8, but would like to hear how you interpret the situation. Please see the repro app linked below. In that app we have verified that the behavior works on NSB 7 and also works if the assembly versions are exactly matching.

Steps to reproduce

See this repro app: https://github.com/oskar/NSBBaseSubscribeUnmatchingVersions

Relevant log output

No response

Additional Information

No response

timbussmann commented 1 year ago

Thanks for the repro sample, this is very helpful! 👍

IIRC, there have been a few optimizations around resolving message types and assembly scanning which are most likely related to your issue.

It seems the problem is caused by the sub-interfaces not being detected as message types in the beginning since they are convention based and don't use the NServiceBus marker interfaces. Convention-based messages are only detected if they are directly referenced from message handlers which isn't the case here. When it tries to load the type at runtime dynamically, the difference in the assembly version causes this logic to fail as the requested version is higher than the one available (the other way should work).

I'll have to dig a bit deeper into this. A potential workaround would be to create empty message handlers for the desired types. This will cause the messages to be correctly registered and the deserialization logic can find the correct types.

timbussmann commented 1 year ago

Ok so I played around with v8 and v7 a bit and this isn't directly a bug in the logic of NServiceBus but rather related to how assembly loading and type lookups work with .NET:

So ultimately, your issue is caused by different behaviors of Type.GetType. This difference of the behavior lies in the target frameworks: NServiceBus v7 targets .NET Standard which causes it to load some assemblies via Assembly.LoadFrom, while NServiceBus v8 uses the newer, .NET (Core) specific assembly loading APIs. Assembly.LoadFrom has some interesting side-effects on the assembly/type resolving logic. I've written a blog post about this quite a while ago in case you want to learn more about it.

That's a lot of details and probably too many details to digest easily. So the workarounds right now would be:

oskar commented 1 year ago

Thanks a lot for the in-depth response! 👍

Before opening the issue, we did a bit of debugging on what might have changed in NServiceBus 8 and found this commit in AssemblyScanner.cs and also this .NET runtime issue by @bording. So it does not come as a surprise that you also point in the direction of this changed behavior in .NET.

Your blog post also does a good job explaining why Assembly.LoadFrom("NServiceBus.Core.dll") is a workaround (while at the same time making it painfully clear that assembly loading in .NET really is non-trivial - I had to read the article a couple of times 😅). It is a workaround we can probably rollout pretty easily (we have a shared infrastructure nuget package that is used for our NServiceBus applications), but I can't shake off the feeling that it is a hack that will come back and bite us again in the future. After all, both your blog post and also the Microsoft employees in the issue linked above recommend avoiding Assembly.LoadFrom() on .NET Core. Would you trust upcoming .NET versions to maintain this behavior?

Regarding using the NServiceBus marker interface instead of conventions, it is something I don't really think would fit our current architecture. Our messages contract packages target netstandard2.0 and are also only allowed to contain a reference to a small base package (which defines our custom IEvent, ICommand and IMessage.). Having an NServiceBus 8 reference in our contracts would require us to multitarget net6 and net472, but we actually have some older applications still on NServiceBus 7 which I think are running an even older version of .NET Framework than net472. To summarize, we prefer conventions over the marker contract to avoid having the NServiceBus reference in our contracts.

There is one more possible workaround and that is to stop publishing our contracts nuget packages with an ever changing assembly version. If we assign no version (or just hard code 1.0.0.0) to the assemblies (but still increase the nuget package version of course), I guess we could avoid the situation? Failing message deserialization just because two versions don't match is always worse than trying and failing on an actual breaking change in the message. As soon as the publishing application goes live with an updated message contract, all consumers/subscribers will be in trouble if there is a breaking change anyway. I haven't really experimented with this internally yet, but do you see any potential problems with this approach? Is there anything regarding assembly loading in NServiceBus that might become problematic if all our messages assemblies would contain the same assembly version even while the message types change over time?

timbussmann commented 1 year ago

both your blog post and also the Microsoft employees in the issue linked above recommend avoiding Assembly.LoadFrom() on .NET Core. Would you trust upcoming .NET versions to maintain this behavior?

Microsoft has been extremely careful of breaking behaviors of existing APIs, so I think we can expect the behavior to continue working this way, but assembly loading has always been a tricky area with hard to predict behaviors so who knows.

To summarize, we prefer conventions over the marker contract to avoid having the NServiceBus reference in our contracts.

Totally understandable 👍 , I was just trying to list all the options.

There is one more possible workaround and that is to stop publishing our contracts nuget packages with an ever changing assembly version. If we assign no version (or just hard code 1.0.0.0) to the assemblies (but still increase the nuget package version of course), I guess we could avoid the situation?

Yes, there is actually one more option that I didn't mention before (because I forgot about it), which is to remove the "Version=x.x.x.x" attribute from the fully qualified assembly name in the enclosed message types header. When not providing a version, .NET will also be more open to resolving types (I'll have to write an additional blog post on this one as the old one was primarily focused on locating assemblies and it didn't take into account versions). This could also be done by a behavior on either publisher or receiver side. Oversimplified, it could look something like this:

class RemoveVersionBehavior : Behavior<IIncomingPhysicalMessageContext>
{
    public override Task Invoke(IIncomingPhysicalMessageContext context, Func<Task> next)
    {
        if (context.Message.Headers.TryGetValue(Headers.EnclosedMessageTypes, out var enclosedMessageTypes))
        {
            context.Message.Headers[Headers.EnclosedMessageTypes] = enclosedMessageTypes.Replace("Version=2.0.0.0,", string.Empty);
        }

        return next();
    }
}

and needs to be registered in the pipeline using configuration.Pipeline.Register(new RemoveVersionBehavior(), "Removes the assembly version from the message types header");

right here, the version is hard-coded and you'd have to write some extra logic to make this more flexible (that's probably why it would be easier on the publisher side).

There is one more possible workaround and that is to stop publishing our contracts nuget packages with an ever changing assembly version. If we assign no version (or just hard code 1.0.0.0) to the assemblies (but still increase the nuget package version of course), I guess we could avoid the situation?

I think that would work too but I haven't tested it. The approach should be no problem as long as you make sure that you're not introducing breaking changes to your message types but instead create a new type containing the breaking change.

timbussmann commented 1 year ago

I wanted to add an additional comment about some potential approaches to "fix" this by making message-type resolution work for this specific scenario. Ultimately, the problem is caused by the desired message type not being registered as a known message type and the runtime resolution logic failing to find a matching type. I've been trying to come up with other scenarios that would run into this issue but I couldn't come up with any right now. This is because there are 3 different conditions necessary, where every combination of just two would be handled by core already I think:

Some ideas on how to make this work:

Each approach has different tradeoffs, e.g. performance on startup or message-processing performance, security, expected behavior, and so on.

boblangley commented 2 weeks ago

I was reviewing this bug and wanted to add a comment.

Having an NServiceBus 8 reference in our contracts would require us to multitarget net6 and net472, but we actually have some older applications still on NServiceBus 7 which I think are running an even older version of .NET Framework than net472.

As part of the release of NServiceBus 9, both NServiceBus 8 and NServiceBus 9 have a separate NetStandard 2.0 package with the message interfaces. This could be considered another workaround.