Open oskar opened 1 year ago
Thanks for the repro sample, this is very helpful! 👍
IIRC, there have been a few optimizations around resolving message types and assembly scanning which are most likely related to your issue.
It seems the problem is caused by the sub-interfaces not being detected as message types in the beginning since they are convention based and don't use the NServiceBus marker interfaces. Convention-based messages are only detected if they are directly referenced from message handlers which isn't the case here. When it tries to load the type at runtime dynamically, the difference in the assembly version causes this logic to fail as the requested version is higher than the one available (the other way should work).
I'll have to dig a bit deeper into this. A potential workaround would be to create empty message handlers for the desired types. This will cause the messages to be correctly registered and the deserialization logic can find the correct types.
Ok so I played around with v8 and v7 a bit and this isn't directly a bug in the logic of NServiceBus but rather related to how assembly loading and type lookups work with .NET:
IItemChanged
message type is loaded and cached as NServiceBus found a handler for it. This is also why adding a handler for the expected message types works.Type.GetType
. Since the incoming message specificies a more concrete type than the one known from the message handler, it attempts to load it using Type.GetType
, and here's where there difference between the versions happens. Note that the incoming fully qualified assembly name of the message contract is higher than the one that is available on disk for the subscriber.
Type.GetType
. Therefore it decides to use the more abstract type that has been identified during endpoint startup.So ultimately, your issue is caused by different behaviors of Type.GetType
. This difference of the behavior lies in the target frameworks: NServiceBus v7 targets .NET Standard which causes it to load some assemblies via Assembly.LoadFrom
, while NServiceBus v8 uses the newer, .NET (Core) specific assembly loading APIs. Assembly.LoadFrom
has some interesting side-effects on the assembly/type resolving logic. I've written a blog post about this quite a while ago in case you want to learn more about it.
That's a lot of details and probably too many details to digest easily. So the workarounds right now would be:
Assembly.LoadFrom("NServiceBus.Core.dll");
at the beginning of your application to enable the same type-resolving logic as NServiceBus version 7Thanks a lot for the in-depth response! 👍
Before opening the issue, we did a bit of debugging on what might have changed in NServiceBus 8 and found this commit in AssemblyScanner.cs and also this .NET runtime issue by @bording. So it does not come as a surprise that you also point in the direction of this changed behavior in .NET.
Your blog post also does a good job explaining why Assembly.LoadFrom("NServiceBus.Core.dll")
is a workaround (while at the same time making it painfully clear that assembly loading in .NET really is non-trivial - I had to read the article a couple of times 😅). It is a workaround we can probably rollout pretty easily (we have a shared infrastructure nuget package that is used for our NServiceBus applications), but I can't shake off the feeling that it is a hack that will come back and bite us again in the future. After all, both your blog post and also the Microsoft employees in the issue linked above recommend avoiding Assembly.LoadFrom()
on .NET Core. Would you trust upcoming .NET versions to maintain this behavior?
Regarding using the NServiceBus marker interface instead of conventions, it is something I don't really think would fit our current architecture. Our messages contract packages target netstandard2.0
and are also only allowed to contain a reference to a small base package (which defines our custom IEvent
, ICommand
and IMessage
.). Having an NServiceBus 8 reference in our contracts would require us to multitarget net6
and net472
, but we actually have some older applications still on NServiceBus 7 which I think are running an even older version of .NET Framework than net472
. To summarize, we prefer conventions over the marker contract to avoid having the NServiceBus reference in our contracts.
There is one more possible workaround and that is to stop publishing our contracts nuget packages with an ever changing assembly version. If we assign no version (or just hard code 1.0.0.0) to the assemblies (but still increase the nuget package version of course), I guess we could avoid the situation? Failing message deserialization just because two versions don't match is always worse than trying and failing on an actual breaking change in the message. As soon as the publishing application goes live with an updated message contract, all consumers/subscribers will be in trouble if there is a breaking change anyway. I haven't really experimented with this internally yet, but do you see any potential problems with this approach? Is there anything regarding assembly loading in NServiceBus that might become problematic if all our messages assemblies would contain the same assembly version even while the message types change over time?
both your blog post and also the Microsoft employees in the issue linked above recommend avoiding Assembly.LoadFrom() on .NET Core. Would you trust upcoming .NET versions to maintain this behavior?
Microsoft has been extremely careful of breaking behaviors of existing APIs, so I think we can expect the behavior to continue working this way, but assembly loading has always been a tricky area with hard to predict behaviors so who knows.
To summarize, we prefer conventions over the marker contract to avoid having the NServiceBus reference in our contracts.
Totally understandable 👍 , I was just trying to list all the options.
There is one more possible workaround and that is to stop publishing our contracts nuget packages with an ever changing assembly version. If we assign no version (or just hard code 1.0.0.0) to the assemblies (but still increase the nuget package version of course), I guess we could avoid the situation?
Yes, there is actually one more option that I didn't mention before (because I forgot about it), which is to remove the "Version=x.x.x.x" attribute from the fully qualified assembly name in the enclosed message types header. When not providing a version, .NET will also be more open to resolving types (I'll have to write an additional blog post on this one as the old one was primarily focused on locating assemblies and it didn't take into account versions). This could also be done by a behavior on either publisher or receiver side. Oversimplified, it could look something like this:
class RemoveVersionBehavior : Behavior<IIncomingPhysicalMessageContext>
{
public override Task Invoke(IIncomingPhysicalMessageContext context, Func<Task> next)
{
if (context.Message.Headers.TryGetValue(Headers.EnclosedMessageTypes, out var enclosedMessageTypes))
{
context.Message.Headers[Headers.EnclosedMessageTypes] = enclosedMessageTypes.Replace("Version=2.0.0.0,", string.Empty);
}
return next();
}
}
and needs to be registered in the pipeline using configuration.Pipeline.Register(new RemoveVersionBehavior(), "Removes the assembly version from the message types header");
right here, the version is hard-coded and you'd have to write some extra logic to make this more flexible (that's probably why it would be easier on the publisher side).
There is one more possible workaround and that is to stop publishing our contracts nuget packages with an ever changing assembly version. If we assign no version (or just hard code 1.0.0.0) to the assemblies (but still increase the nuget package version of course), I guess we could avoid the situation?
I think that would work too but I haven't tested it. The approach should be no problem as long as you make sure that you're not introducing breaking changes to your message types but instead create a new type containing the breaking change.
I wanted to add an additional comment about some potential approaches to "fix" this by making message-type resolution work for this specific scenario. Ultimately, the problem is caused by the desired message type not being registered as a known message type and the runtime resolution logic failing to find a matching type. I've been trying to come up with other scenarios that would run into this issue but I couldn't come up with any right now. This is because there are 3 different conditions necessary, where every combination of just two would be handled by core already I think:
Type.GetType
from the message header values would fail.Type.GetType
would find the specified type and register it as a known message type.IHandleMessages<T>
interface) are also registered as message type. All registered message types should also resolve using match-by-name, even when Type.GetType
would fail (e.g. due to version mismatch)Some ideas on how to make this work:
Type.GetType
.Each approach has different tradeoffs, e.g. performance on startup or message-processing performance, security, expected behavior, and so on.
I was reviewing this bug and wanted to add a comment.
Having an NServiceBus 8 reference in our contracts would require us to multitarget net6 and net472, but we actually have some older applications still on NServiceBus 7 which I think are running an even older version of .NET Framework than net472.
As part of the release of NServiceBus 9, both NServiceBus 8 and NServiceBus 9 have a separate NetStandard 2.0 package with the message interfaces. This could be considered another workaround.
Describe the bug
Description
We have a system with many applications with their own messages contract packages (published as nuget packages) which publishes events that other applications subscribe to (consuming the publishing application's contract nuget package to be able to deserialize the messages).
Each time a new version of an application is released (multiple times per day), the assembly version is bumped by the CI pipeline. But only if the contracts are actually changed, we publish a new contract nuget package. All subscribing applications always try to consume the latest available contract nuget packages from the publishers, to avoid message deserialization problems. But in practice, they almost never consume the exact same assembly version for two reasons:
Secondly, a common scenario is that we have a message hierarchy that can look like this:
The publishing application publishes specific events like
IPriceChanged
andICategoryChanged
and the consuming application might subscribe to all item changes usingIHandleMessages<IItemChanged>
Thirdly, sometimes the consuming application downcasts the messages to the specific types:
In NServiceBus 7, this scenario has always worked as expected. The subscriber receives an instance of type
IPriceChanged__impl
orICategoryChanged__impl
to theHandle
method so the downcasting works. After upgrading to NServiceBus 8, this behaviour has changed. TheHandle
method now receives an instance ofIItemChanged__impl
- making the downcast fail.Right now we have identified two overlapping workarounds:
IPriceChanged
andICategoryChanged
instead ofIItemChanged
)We consider this changed behavior a bug in NServiceBus 8, but would like to hear how you interpret the situation. Please see the repro app linked below. In that app we have verified that the behavior works on NSB 7 and also works if the assembly versions are exactly matching.
Steps to reproduce
See this repro app: https://github.com/oskar/NSBBaseSubscribeUnmatchingVersions
Relevant log output
No response
Additional Information
No response