dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.04k stars 2.02k forks source link

Issue with generated codec for referenced project record message type #9092

Open kzu opened 1 month ago

kzu commented 1 month ago

Problem: generated serializer in server project for a referenced assembly opted-in to codegen via [assembly: GenerateCodeForDeclaringAssembly] fails to generate proper codec.

Repro solution at https://github.com/kzu/OrleansCodeGenIssue

Copy of the readme from there:

Solution contains:

  1. Models: a class library project using only the Microsoft.Orleans.Serialization.Abstractions package so record types used in grains messages can be annotated with [GenerateSerializer]. This is intended as a contracts assembly, so we want to keep the Orleans dependencies to a minimum.

  2. Hosting Orleans project: this contains the grain and full codegen.

The grain implements two strategies (methods) that showcase the issue (which is a codegen one):

  1. Deposit(Deposit message): the message type is a record in a referenced project, annotated with with [GenerateSerializer]. The type/assembly is referenced and opted-in for referenced assembly codegen via [assembly: GenerateCodeForDeclaringAssembly(typeof(Deposit))]

  2. Deposit2(Deposit2 message): the message type is a record in the same project as the grain, also annotated with [GenerateSerializer].

Other than the declaring project, there is no difference between the two.

Reproduce the bug:

  1. Run the hosting project.
  2. Navigate to https://localhost:7125/account/1/deposit/100. The response should be the new balance. Note how it's always an empty response.
  3. Navigate to https://localhost:7125/account/1/deposit2/100. The response IS the new balance. Every refresh appends more to the balance, which is the correct response.

After hitting 3., you can go back to 2. and see that what you get is the last balance updated by Deposit2 (since there is only one state, to eliminate issues with state persistence). But you can never increment.

The generated codec for one in-project vs the referenced one differs as follows:

public void Serialize<TBufferWriter>(ref global::Orleans.Serialization.Buffers.Writer<TBufferWriter> writer, global::OrleansGeneratorBug.Deposit2 instance)
    where TBufferWriter : global::System.Buffers.IBufferWriter<byte>
{
    global::Orleans.Serialization.Codecs.DecimalCodec.WriteField(ref writer, 0U, instance.Amount);
    writer.WriteEndBase();
}

The referenced type is not writing the Amount value at all:

[global::System.Runtime.CompilerServices.MethodImplAttribute(global::System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)]
public void Serialize<TBufferWriter>(ref global::Orleans.Serialization.Buffers.Writer<TBufferWriter> writer, global::Models.Deposit instance)
    where TBufferWriter : global::System.Buffers.IBufferWriter<byte>
{
    writer.WriteEndBase();
}
ReubenBond commented 1 month ago

Thanks for reporting, @kzu! I think you've hit this issue: https://github.com/dotnet/orleans/pull/8860

Does that sound right?

kzu commented 1 month ago

That looks precisely it! Thanks for the quick reply 🙏

kzu commented 1 month ago

Ok @ReubenBond I think I found the root cause: https://github.com/dotnet/roslyn/issues/74634

kzu commented 1 month ago

And the actual root cause might be that (at least in .NET SDK 9.0), the library produces a reference assembly by default and that's what passed to the generator, which obviously has no private impl. details including the backing fields.

ReubenBond commented 1 month ago

There is probably not much which can be done about that, other than perhaps suppressing ref assemblies in the project with codegen: https://github.com/dotnet/roslyn/discussions/72374#discussioncomment-8655916

kzu commented 1 month ago

Turns out that doesn't solve it either: https://github.com/kzu/BackingFieldRecordSymbolMissing/commit/a2f9b6c4bd29479ba279b8c40d9e14cd510b494c

It has to be something else then, but I'm puzzled what it can be!

kzu commented 1 month ago

Ok, so how about something like this:

1 - A user that references the SerializationAbstractions but NOT the codegen/SDK, can be assumed to be someone who will use the serialization annotations but not the codegen in that project itself, which will cause them to run into this very issue.

  1. Therefore, the abstractions package comes with an analyzer that detects that the codegen is NOT also installed (via a compiler visible property).
  2. That analyzer emits a "fake" field with an equally obscure naming convention to simulate the missing backing field, such as: int ǃPropertyNameǃk__BackingField;

    NOTE: The ǃ looks like a ! but it's not, it's a unicode char that's unlikely to be used in this manner. But you could pick any convention.

  3. The Orleans generator detects this scenario (record with primary constructor with missing backing field with the format $"<{property.Name}>k__BackingField"; and tries the workaround one too.
  4. Generated code still assumes the backing field will be there with the original name, so everything works as of now.

The only drawback I see to this approach is that it would emit an unused field in the record in the otherwise pretty pristine "messages/contracts" project. But not much else.

Instead of having a half-broken support for GenerateCodeForDeclaringAssembly with no workaround of fix in sight.

Thoughts?

PS: tried creating a fake IFieldSymbol too but it seems even trickier.

ReubenBond commented 1 month ago

Perhaps we can find a clean/sure-fire way to detect that the target is a record type, find the primary constructor, and create unsafe accessors for the properties since records expose their ctor params as init-only properties. Eg, this code allows us to set Amount post-construction:

var x = new External(45);
SetAmount(x, 35);
Console.WriteLine(x.Amount);

[UnsafeAccessor(UnsafeAccessorKind.Method, Name = "set_Amount")]
extern static void SetAmount(External instance, int value);
kzu commented 1 month ago

That sounds much more solid, yeah!

kzu commented 1 month ago

Shocking, Roslyn isn't buying my fake symbol 😅

image

kzu commented 1 month ago

@ReubenBond should I give this a try or are you going to work on it soon-ish?

ReubenBond commented 1 month ago

Please give it a go if you can. I can assist if you hit any issues. Thank you, @kzu!

kzu commented 1 month ago

ok, I've spent some time understanding how things currently work in the generator.

Do you want to keep the existing field accessor approach or should it be replaced wholesale with the new "unsafe accessor"? What should be done with the existing case of looking up the backing field for properties? I think the case of "missing fields" for properties is likely not exclusive to records with primary constructors, since those are implementation details that (per roslyn) might also be missing in regular classes....

kzu commented 1 month ago

Hey @ReubenBond, I've got a question regarding including primary ctor parameters.

The GenerateSerializerAttribute states that IncludePrimaryConstructorParameters will only default to true for record types. And the code uses ITypeSymbol.IsRecord to check for that.

For the following record struct, however, that property returns false (when defined in a referenced project/assembly):

public record struct Person2ExternalStruct(int Age, string Name)

This seems to be "by design" (however inconsistent) in roslyn. I've verified this to be the case indeed.

This means that the provided failing test still fails even if a (non-struct) record I added now does work with my changes.

I think Orleans should try to hide these generator/roslyn inconsistencies and have a unifying approach in either case. I think the following heuristics for determining a ctor is a primary ctor should be very reliable:

  1. Ctor has non-zero parameters
  2. All parameters match by name exactly with corresponding properties
  3. All matching properties have getter AND setter annotated with [CompilerGenerated].

3 in particular is key to discarding a constructor with an argument which by chance happens to have the same name as a property.

Thoughts?

kzu commented 1 month ago

I think I got a potential fix 💪 https://github.com/dotnet/orleans/pull/9104