Open fabianoliver opened 4 months ago
Hmm, I would guess that this is because inner LambdaExpression
s aren't compiled when the outer expression is compiled. Chr.Avro generates a LambdaExpression
for each record type to support recursive references. (To confirm, you should be able to verify that (de)serializers for non-record types do not call CreateDelegate
.)
Given this, we might have to revisit how we build expression trees.
I'm still trying to understand what .NET is doing under the hood there. I don't think it's an issue of compilation per se - in the sense that the nested lamda is compiled fine I think. I'm not even sure it'd really count as a proper bug, whether delegates are or are not cached is definitely more art than science in C# I think, with the how and ifs even having changed over language versions. So probably "just" more of a performance downside particularly noticeable in scenarios with fairly high message rates.
As for delegate creation, my understanding was that non-capturing lambdas are cached, but that doesn't seem to be happening for expressions (though I'd need to double check if the language specs say they can be cached, or must be cached).
The simplest example I could condense it down to so far is the following:
var param = Expression.Parameter(typeof(int));
var innerParam = Expression.Parameter(typeof(Action<int>));
var inner = Expression.Lambda(innerParam.Type, Expression.Block(), "test", new[] { param });
var action = Expression.Lambda<Action<int>>(Expression.Block(new[] { innerParam }, Expression.Assign(innerParam, inner), Expression.Invoke(innerParam, param)), param).Compile();
// Each invocation will trigger delegate creation
action(456);
action(567);
Interestingly, I don't observe the same behaviour when creating an equivalent function/invocation structure directly in code outside of the expression system:
Action<int> inner = _ => { };
Action<int> action= x =>
{
Action<int> innerParam = inner;
innerParam(x);
};
// Delegates are cached just fine
action(456);
action(567);
Even more interestingly, FastExpressionCompiler seems to be doing a better job here. If I compile any of my samples with that, delegates do seem to be cached fine - though I may have to run this through for more complex examples.
After a bit more testing - FastExpressionCompiler isn't able to compile more complex lambda expressions / throws in those cases, so unfortunately doesn't seem to be a great workaround.
Thinking about this a little further, I suspect that in the current way expression trees are created, re-creating delegates might be unavoidable in some cases. Not in the simple examples above, but at least once you have types whose properties are other complex types (whether self-referential or not) - if I understand the current generators right, their lambda would reference variables from the outer scope to access their nested types' deserializers. I think that would inevitably make the lambda capture the outer parameter, and capturing lambdas are never cached. Though hard to verify at the moment given even the simple, non-capturing versions aren't cached in the default compilation path.
I think a few options in increasing order of madness that come to mind at the moment are:
delegate object DeserializeRecord(Type dtoType, string recordName, ref BinaryReader reader)
, then have each generated record deserializer take a delegate of that signature that'd allow it to deserialize nested types without having to capture/reference something from the outer scope? Hard to say for sure though if that'd really do the trickMaybe the first option isn't so bad after all.. :-)
Thanks for the detailed writeup! A few additional thoughts:
Possibly do a lazy workaround that'd generate a simpler expression tree in case where the record is a simple type that contains only primitive fields.
Of the options you laid out, something like that is probably the most pragmatic. (A more sophisticated variation might be to walk the type graph and only do the circular reference trick for types that appear in a cycle.)
Maybe the first option isn't so bad after all.. :-)
This is more or less where we're at too. I'm going to leave this as a bug since it's an unintended consequence of how the serdes are built, but it's unlikely that we'll be able to prioritize working on it.
I was going through a memory dump and noticed unusually high allocations coming from calls to
AsyncSchemaRegistryDeserializer<T>.DeserializeAsync
on a process running on Win10 under .NET8.The allocated types were of type
Delegate
, with their call stack goingDeserializeAsync
->System.Reflection.Emit.DynamicMethod.CreateDelegate(Type, Object)
->Delegate.CreateDelegatenoSecurityCheck
.It appears that every call to
DeserializeAsync
leads to a new delegate being created for every single deserialization call (!!). For example, set breakpoint in Delegate.CoreCLR.cs ->CreateDelegateNoSecurityCheck
, and run something likeCreateDelegateNoSecurityCheck
will be hit (at least?) three times. It's not clear to me yet why that is happeningMore self-contained example to reproduce this: