ExceptionCodec and multiple references to a single exception object

david-obee commented 1 year ago

Hi all, this issue is less of a real 'bug' in Orleans, as you can argue we're using it wrong here. However, it's part of a broader consideration about throwing exceptions from grains, with respect to serialization. I'm looking for some advice and to determine if it's worth expanding the documentation at all.

We've just had an issue caused because we used the GenerateSerializerAttribute to generate a codec for one of our custom exceptions, and that same exception object was referenced in two places. The ExceptionCodec explicitly has special handling around this for serialization and deserialization:

Serialization:

// Exceptions are never written as references. This ensures that reference cycles in exceptions are not possible and is a security precaution. 
ReferenceCodec.MarkValueField(writer.Session);
writer.WriteStartObject(fieldIdDelta, expectedType, typeof(ExceptionCodec));
SerializeException(ref writer, value);
writer.WriteEndObject();

Deserialization:

// In order to handle null values.
if (field.WireType == WireType.Reference)
{
    // Discard the result of consuming the reference and always return null.
    // We do not allow exceptions to participate in reference cycles because cycles involving InnerException are not allowed by .NET
    // Exceptions must never form cyclic graphs via their well-known properties/fields (eg, InnerException).
    var _ = ReferenceCodec.ReadReference<Exception, TInput>(ref reader, field);
    return null;
}

The type we were serializing was like:

[GenerateSerializer]
public class BatchTransactionException : Exception
{
    [Id(0)]
    public List<Result> Results { get; }

    public Message(List<Result> results) : base(new AggregateException(results.Select(x => x.Exception)))
    {
        Results = results;
    }
}

[GenerateSerializer]
public class Result
{
    [Id(0)]
    public int MessageOffset { get; }

    [Id(1)]
    public Exception Exception { get; }

    public Result(int messageOffset, Exception exception)
    {
        MessageOffset = messageOffset;
        Exception = exception;
    }
}

[GenerateSerializer]
public class MyException : Exception
{
    ...
}

The issue comes when the exception contained in BatchTransactionException.Results on a Result is our custom exception (although it can be any exception for which serialization code has been generated).

On serialization, when serializing the exception, first the exception is serialized using it's own generated serializer as part of the InnerException on the BatchTransactionException. As this is using the generated code, it serializes like any other type, and puts in a reference to the object in the serialization context.

Then when it comes to serialize that exception again as part of the Results, it is again using this code-gen serializer, so like any normal type, it recognises that the same object is already serialized (from the serialization context), so inserts a reference to it, rather than reserializing it from scratch.

This is fine, until we come to deserialization. We get to deserializing the second instance of the MyException, the one in the Result, which has been serialized as a reference to the other place it appears which is in the BatchTransactionException.InnerException. However, at this point the deserializer only knows to expect some kind of Exception, so it loads up the ExceptionCodec, and uses that to deserialze. This then sees that we have serialized a reference, which we don't allow for Exceptions, so it get's serialized as a null. This is completely silent, and hence difficult to spot why this breaks when you've not seen this issue before.

To clarify, the reason we've gone with using the GenerateSerializerAttribute for these exceptions is because we need to serialze them fully. We're aware that we can configure the ExceptionCodec to run for our own exceptions by adjusting the namespaces it's considered for, however the ExceptionCodec is not perfect in that it does not handle extra custom properties. In this case, we have a custom exception that is part of our domain, so we want to be able to serialize it fully like we do with any other type our grain calls can return.

I have a few suggestions for what could be done here. We could:

Accept the general restriction of "no more than one reference to a single exception object" in any type we serialize. This avoids the issue, and might be good advice in general (based on the code comment in the ExceptionCodec)? For what it's worth in our case we have changed things so that we no longer do this to side-step the issue entirely.
Expand the serializer documentation, perhaps with a special section on exceptions. I think it would be good to get some guidance on the best way to approach exception serialization in general, what quirks and edge cases there are (like the above), and how the ExceptionCodec works and how you can enable it for more types. I've seen this come up a couple of times in Discord, so I think it's worth having some guidance in the docs. Exceptions are different to other types, in that you can't tell from a signature all the possible exceptions a method could throw, and they can come from third party libraries, so I think there are special considerations that have to be made for them.
Modify the code-gen specifically for Exception derived types, so that they also get the same behavior with respect to references as the ExceptionCodec has. This would allow users to use the code-gen for exceptions and not hit any issues with the ExceptionCodec.

If you agree with any of these suggestions, I would be happy to help out if I can.

ReubenBond commented 1 year ago

Thank you for reporting and for the detailed investigation & write-up, @david-obee!

Regarding modifying the codegen, we could infer [SuppressReferenceTracking] for types which derive from Exception as a special-case. This would result in them always being serialized in full. If there was a reference cycle, the application would crash during serialization time with a StackOverflowException, however (at least that is easier to debug). eg here: https://github.com/dotnet/orleans/blob/28256dba1681e7fafde850ecb10c20b25faaf720/src/Orleans.CodeGenerator/Model/SerializableTypeDescription.cs#L198

In ExceptionCodec, we should throw if the ReadReference return value on the deserialization path is not null: https://github.com/dotnet/orleans/blob/28256dba1681e7fafde850ecb10c20b25faaf720/src/Orleans.Serialization/ISerializableSerializer/ExceptionCodec.cs#L293

Expanding the documentation is a good idea. We could add a section specifically about exception serialization with an example showing how to do it right and a call-out telling developers to avoid creating potential reference cycles.

Any help you can provide is appreciated.

david-obee commented 1 year ago

Hi @ReubenBond, thanks for the comments above, apologies I'm returning to this so late! I've raised a PR implementing your two suggestions here: https://github.com/dotnet/orleans/pull/8698.

There's two parts. The first part I think is definitely worth having (throwing an Exception over silently returning null). That's quite simple and just helpful.

The second part I think is debatable on if it's worth the change (or if I've implemented it the best way). I'll leave decision on whether to keep it or not to you on the PR, I'm happy to take that second part out if you want.

dotnet / orleans

ExceptionCodec and multiple references to a single exception object #8628