dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.09k stars 4.7k forks source link

`FormatterServices.GetUninitializedObject` replacement? #17073

Closed eiriktsarpalis closed 4 years ago

eiriktsarpalis commented 8 years ago

I'm author of FsPickler, a multi-format message serialization library intended as a replacement to BinaryFormatter. The focus of the library is serialization of closures, which means that I need to support things like subtyping and internal, compiler-generated classes that often have no accessible constructors or serialization attribute annotations.

I would be interested in porting FsPickler (and its father project, mbrace) to CoreCLR, however there are many obstacles, mostly related to deprecated APIs. The component I'm more concerned about is the FormatterServices.GetUninitializedObject method which is used for instantiating objects without relying on a parameterless constructor. This is crucial for us since many closure types (particularly F# lambdas) do not come with such constructors.

Do you have any plans for replacing this with equivalent functionality?

/cc https://github.com/dotnet/corefx/issues/6564#issuecomment-212620200 https://github.com/dotnet/coreclr/issues/2715

KrzysztofCwalina commented 8 years ago

@eiriktsarpalis, I think it would be great if we could come up with a solution for scenarios like yours without resorting to using/exposing things that amount to internal/private APIs. The fact that .NET Framework's binary serialization uses internal/private state has been a constant compatibility and security headache.

And so I would be interested in understanding why you think we need to construct unititialized objects (logically not a public API) as opposed to requiring that serializable types expose a public constructor that could be used for serialization (either default or parameterized).

eiriktsarpalis commented 8 years ago

@KrzysztofCwalina granted, this is certainly something which touches internals. But the same thing could be said of any library that uses some form of reflection/binding flags to access internals. There are very few useful serialization libraries out there that do not rely on this.

Like I said, my primary concern is closure serialization. Let's consider the introductory example from the Mobius project:

var lines = sparkContext.TextFile(@"hdfs://path/to/input.txt");  
var wordCounts = lines
      .FlatMap(s => s.Split(' '))
      .Map(w => new KeyValuePair<string, int>(w.Trim(), 1))  
      .ReduceByKey((x, y) => x + y)
      .Collect();

In order to be able to distribute the above workflow across a Spark cluster, it is necessary to be able to serialize a considerably complex object graph that includes lambda expressions. Depending on the language/library implementation, such lambdas are typically encoded using internal, compiler-generated classes which lack accessible constructors or serialization attributes. A good example are F# lambdas which are serializable abstract classes. I do not see how language designers could ensure compiler generated types are serializable without unnecessarily polluting the public API of the generated assembly.

KrzysztofCwalina commented 8 years ago

The thing is that when people use private reflection, they are more likely to be aware that their code might not work at some point, as we don't guarantee the shape of internal/private types to be the same between versions/frameworks. But for serialization, we do try very hard to be cross version compatible.

Public APIs are the way we describe compatibility guarantees of .net types, and it would be great if serialization used the same mechanism. It would make it easy to reason about serialization compatibility.

Good point about polluting APIs. I think there will be some types that might not want to hide some of their public APIs. But I wonder if that could not be addressed with EditorBrowsable attributes and/or serialization proxy types/APIs.

eiriktsarpalis commented 8 years ago

I agree, reflection-based serialization that relies on internal class representations stinks. But for lack of a viable alternative for many classes that exist out there, it's immensly useful in some applications. I get how this can be dangerous looking at it from the point of view of an asp.net developer, but there are other applications in .NET than just the web. The serialization that I'm interested in is of the non-persistable, non-version tolerant, non-heterogeneous communication variety. As mentioned earlier, there are quite a lot of important Microsoft projects that rely on this as well.

I don't use BinaryFormatter for my projects and I agree that it is broken and obsolete. What I'm basically asking for is please let us build our own general-purpose serializers on top of CoreCLR. I get it, reflection is ugly, it invalidates accessibility modifiers and is an all-around security liability. But if used responsibly it lets us write some amazing libraries. It allows us to do runtime metaprogramming and gets us places where the type system on its own simply cannot.

eiriktsarpalis commented 8 years ago

In any case, the removal of the (arguably bad) ISerializable abstraction has ironically reinforced our reliance on field-based serialization instead of reducing it.

zhenlan commented 8 years ago

@KrzysztofCwalina the removal of ISerializable indeed made it hard (if it is not impossible) to maintain compatibility between full and core framework for serialization - Exception types are a good example. What's your thoughts about bringing it back?

cc: @SGuyGe, @shmao

sergeybykov commented 8 years ago

FormatterServices.GetUninitializedObject() is absolutely necessary for Orleans-generated serializers. Without it we'll be entirely broken.

KrzysztofCwalina commented 8 years ago

@zhenlan, you can see dotnet/corefx#8133 on some my current thinking around this issue.

I will look into ISerializable as well, but I think it's a bit more tricky. The two APIs that I proposed we add are general purpose enablers. ISerializable is binary serialization and it comes with its baggage. But we really want to unblock the community, and so nothing is off the table.

eiriktsarpalis commented 8 years ago

@KrzysztofCwalina I should point out that ISerializable is not tied to binary serialization (or any serializer/serialization format for that matter). It merely provides an abstraction for serializer implementations to extract and reconstruct objects using SerializationInfo.

KrzysztofCwalina commented 8 years ago

@eiriktsarpalis, when I said "binary serialization", I meant the specific technology we have in .NET Framework. I do not have anything against serializing to a binary format. I actually think it's great/efficient.

As a side note, do you know of any other serializers using ISerializable besides the serializers in .NET Framework? i.e. I wish ISerializable was an abstraction, but it does not seem to be to me. StreamingInfo is quite locked and does not let me access the data in an out-of-band library (i.e. library that does not use internal framework APIs).

eiriktsarpalis commented 8 years ago

It does, json.net implements support for it and so does our own serializer implementation, FsPickler which supports binary, json, bson and xml serialization formats. SerializationInfo is perfectly accessible by third-party libraries, it's merely a special type of key/value accumulator.

KrzysztofCwalina commented 8 years ago

Yeah, I guess you are right. I was just hoping to be able implement my own SerializationInfo and intercept the calls to AddValue/GetXxx.

zhenlan commented 8 years ago

Just to confirm what @eiriktsarpalis said, ISerializable is not just used by "binary serialization". Other serializers can use it too. Our own serializers DataContractSerializer and DataContractJsonSerializer also use ISerializable.

jdom commented 8 years ago

Hi, While evaluating whether we could eventually build a port of Orleans that supports CoreCLR, I've done some tests with manually creating serializers for exceptions using the FormatterServices.GetUninitializedObject method (via reflection), and seems to unblock us for now. Compared to just having the binary formatter, we are of course constrained with what parts of the exceptions we can serialize, and also of non-Exception types we want to serialize too for inter grain communication. For now I'm only serializing a subset of the Exception fields, but that means custom properties from exceptions will not flow through in any kind of grain interaction (such as ParamName, or ErrorCode, or HttpStatus from different exception types). This is a limitation that I guess we can live with until we get proper support for serialization in .NET Core (as was announced post-RTM).

In a nutshell, I'm serializing the following private fields from System.Exception using reflection: _className, _innerException, _message, _remoteStackTraceString, _stackTraceString. This allows for some troubleshooting of remote logic, since this allows us to preserve the exception type and because Exception.ToString() uses only those fields to print the exception details (unless of course a derived exception decides to override ToString). I only tested that this works in .NET Core RC2 running on Windows, but I'm not familiar whether accessing these private fields through reflection could cause issues in other platforms, could you guys validate?

These are the relevant portions (without any extra optimizations, as this is not production code yet) of the steps I took to serialize exceptions in Orleans:

public class ExceptionSerializer
{
    private static readonly FieldInfo classNameField = typeof (Exception).GetTypeInfo().GetField("_className", (BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public));
    private static readonly Func<Exception, String> getClassNameField = (Func<Exception, String>)SerializationManager.GetGetter(classNameField);
    private static readonly Action<Exception, String> setClassNameField = (Action<Exception, String>)SerializationManager.GetReferenceSetter(classNameField);
    private static readonly FieldInfo innerExceptionField = typeof (Exception).GetTypeInfo().GetField("_innerException", (BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public));
    private static readonly Func<Exception, Exception> getInnerExceptionField = (Func<Exception, Exception>)SerializationManager.GetGetter(innerExceptionField);
    private static readonly Action<Exception, Exception> setInnerExceptionField = (Action<Exception, Exception>)SerializationManager.GetReferenceSetter(innerExceptionField);
    private static readonly FieldInfo messageField = typeof (Exception).GetTypeInfo().GetField("_message", (BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public));
    private static readonly Func<Exception, String> getMessageField = (Func<Exception, String>)SerializationManager.GetGetter(messageField);
    private static readonly Action<Exception, String> setMessageField = (Action<Exception, String>)SerializationManager.GetReferenceSetter(messageField);
    private static readonly FieldInfo remoteStackTraceStringField = typeof (Exception).GetTypeInfo().GetField("_remoteStackTraceString", (BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public));
    private static readonly Func<Exception, String> getRemoteStackTraceStringField = (Func<Exception, String>)SerializationManager.GetGetter(remoteStackTraceStringField);
    private static readonly Action<Exception, String> setRemoteStackTraceStringField = (Action<Exception, String>)SerializationManager.GetReferenceSetter(remoteStackTraceStringField);
    private static readonly FieldInfo stackTraceStringField = typeof (Exception).GetTypeInfo().GetField("_stackTraceString", (BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public));
    private static readonly Func<Exception, String> getStackTraceStringField = (Func<Exception, String>)SerializationManager.GetGetter(stackTraceStringField);
    private static readonly Action<Exception, String> setStackTraceStringField = (Action<Exception, String>)SerializationManager.GetReferenceSetter(stackTraceStringField);

    [CopierMethodAttribute]
    public static Object DeepCopier(Object original)
    {
        Type exceptionType = original.GetType();
        Exception input = (Exception)original;
        Exception result = (Exception)SerializationManager.GetUninitializedObjectWithFormatterServices(exceptionType);
        setClassNameField(result, getClassNameField(input));
        setInnerExceptionField(result, (Exception)SerializationManager.DeepCopyInner(getInnerExceptionField(input)));
        setMessageField(result, getMessageField(input));
        setRemoteStackTraceStringField(result, getRemoteStackTraceStringField(input));
        setStackTraceStringField(result, getStackTraceStringField(input));
        SerializationContext.Current.RecordObject(original, result);
        return result;
    }

    [SerializerMethodAttribute]
    public static void Serializer(Object untypedInput, BinaryTokenStreamWriter stream, Type expected)
    {
        Exception input = (Exception)untypedInput;
        SerializationManager.SerializeInner(getClassNameField(input), stream, typeof (String));
        SerializationManager.SerializeInner(getInnerExceptionField(input), stream, typeof (Exception));
        SerializationManager.SerializeInner(getMessageField(input), stream, typeof (String));
        SerializationManager.SerializeInner(getRemoteStackTraceStringField(input), stream, typeof (String));
        SerializationManager.SerializeInner(getStackTraceStringField(input), stream, typeof (String));
    }

    [DeserializerMethodAttribute]
    public static Object Deserializer(Type expected, BinaryTokenStreamReader stream)
    {
        Type exceptionType = expected;
        Exception result = (Exception)SerializationManager.GetUninitializedObjectWithFormatterServices(exceptionType);
        DeserializationContext.Current.RecordObject(result);
        setClassNameField(result, (String)SerializationManager.DeserializeInner(typeof (String), stream));
        setInnerExceptionField(result, (Exception)SerializationManager.DeserializeInner(typeof (Exception), stream));
        setMessageField(result, (String)SerializationManager.DeserializeInner(typeof (String), stream));
        setRemoteStackTraceStringField(result, (String)SerializationManager.DeserializeInner(typeof (String), stream));
        setStackTraceStringField(result, (String)SerializationManager.DeserializeInner(typeof (String), stream));
        return result;
    }
}

public static class SerializationManager
{
    // Workaround for CoreCLR where FormatterServices.GetUninitializedObject is not public (but might change in RTM so we could remove this then).
    private static readonly Func<Type, object> getUninitializedObjectDelegate =
        (Func<Type, object>)
            typeof(string)
                .GetTypeInfo()
                .Assembly
                .GetType("System.Runtime.Serialization.FormatterServices")
                .GetMethod("GetUninitializedObject", BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static)
                .CreateDelegate(typeof(Func<Type, object>));
    public static object GetUninitializedObjectWithFormatterServices(Type type)
    {
        return getUninitializedObjectDelegate.Invoke(type);
    }

    public static Delegate GetGetter(FieldInfo field)
    {
        return GetGetDelegate(
            field,
            typeof(Func<,>).MakeGenericType(field.DeclaringType, field.FieldType),
            new[] { field.DeclaringType });
    }

    public static Delegate GetReferenceSetter(FieldInfo field)
    {
        var delegateType = typeof(Action<,>).MakeGenericType(field.DeclaringType, field.FieldType);
        return GetSetDelegate(field, delegateType, new[] { field.DeclaringType, field.FieldType });
    }

    private static Delegate GetSetDelegate(FieldInfo field, Type delegateType, Type[] parameterTypes)
    {
        var declaringType = field.DeclaringType;
        if (declaringType == null)
        {
            throw new InvalidOperationException("Field " + field.Name + " does not have a declaring type.");
        }

        // Create a method to hold the generated IL.
        var method = new DynamicMethod(field.Name + "Set", null, parameterTypes, field.FieldType.GetTypeInfo().Module, true);

        // Emit IL to return the value of the Transaction property.
        var emitter = method.GetILGenerator();
        emitter.Emit(OpCodes.Ldarg_0);
        emitter.Emit(OpCodes.Ldarg_1);
        emitter.Emit(OpCodes.Stfld, field);
        emitter.Emit(OpCodes.Ret);

        return method.CreateDelegate(delegateType);
    }

    // other members omitted
}

Just as FYI, this is in an exploratory branch I have here: https://github.com/jdom/orleans/blob/coreclr-multi/src/Orleans/Core/ExceptionSerializer.cs

stephentoub commented 7 years ago

FormatterServices.GetUninitializedObject has been added to .NET Core 1.2. Related discussion at https://github.com/dotnet/corefx/issues/8133.