dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.97k stars 4.66k forks source link

API Proposal: Add type to avoid boxing .NET intrinsic types #28882

Open JeremyKuhne opened 5 years ago

JeremyKuhne commented 5 years ago

Background and Motivation

Currently there is no way to pass around a heterogeneous set of .NET value types without boxing them into objects or creating a custom wrapper struct. To facilitate low allocation exchange of value types we should provide a struct that allows passing the most common value types without boxing and still allows storing within other types (including arrays) or on the heap when needed.

ASP.NET and Azure SDK have both expressed a need for this functionality for scenarios such as logging.

This following is an evolved proposal based on feedback from various sources. The original proposal is included below. Key changes were to make this a smaller type, alignment with object semantics, support of any object, and more focused non-boxing support.

Proposed API

public readonly struct Value
{
    public Value(object? value);
    public Value(byte value);
    public Value(byte? value);
    public Value(sbyte value);
    public Value(sbyte? value);
    public Value(bool value);
    public Value(bool? value);
    public Value(char value);
    public Value(char? value);
    public Value(short value);
    public Value(short? value);
    public Value(int value);
    public Value(int? value);
    public Value(long value);
    public Value(long? value);
    public Value(ushort value);
    public Value(ushort? value);
    public Value(uint value);
    public Value(uint? value);
    public Value(ulong value);
    public Value(ulong? value);
    public Value(float value);
    public Value(float? value);
    public Value(double value);
    public Value(double? value);
    public Value(DateTimeOffset value);         // Boxes with offsets that don't fall between 1800 and 2250
    public Value(DateTimeOffset? value);        // Boxes with offsets that don't fall between 1800 and 2250
    public Value(DateTime value);
    public Value(DateTime? value);
    public Value(ArraySegment<byte> segment);
    public Value(ArraySegment<char> segment);
    // No decimal as it always boxes

    public Type? Type { get; }                      // Type or null if the Value represents null
    public static Value Create<T>(T value);
    public unsafe bool TryGetValue<T>(out T value); // Fastest extraction
    public T As<T>();                               // Throws InvalidCastException if not supported

    // For each type of constructor except `object`:
    public static implicit operator Value(int value) => new(value);
    public static explicit operator int(in Value value) => value.As<int>();
}

Fully working prototype

Usage Examples

public static void Foo(Value value)
{
    Type? type = value.Type;
    if (type == typeof(int))
    {
        int @int = value.As<int>();

        // Any casts that would work with object work with Value

        int? nullable = value.As<int?>();

        object o = value.As<object>();
    }

    if (value.TryGetValue(out long @long))
    {
        // TryGetValue follows the same casting rules as "As"
    }

    // Enums are not boxed if passed through the Create method
    Value dayValue = Value.Create(DayOfWeek.Friday);

    // Does not box (until Now is > 2250)
    Value localTime = DateTimeOffset.Now;
    localTime = Value.Create(DateTimeOffset.Now);
    localTime = new(DateTimeOffset.Now);

    // ArraySegment<char> and ArraySegment<byte> are supported
    Value segment = new ArraySegment<byte>(new byte[2]);

    // Any type can go into value, however. Unsupported types will box as they do with object.
    Value otherSegment = new(new ArraySegment<int>(new int[1]));
}

Details

Goals

Other benefits

Other Possible Names

Original Proposal Currently there is no way to pass around a heterogeneous set of .NET value types without boxing them into objects or creating a custom wrapper struct. To facilitate low allocation exchange of value types we should provide a struct that allows passing the information without heap allocations. The canonical example of where this would be useful is in `String.Format`. **Related proposals and sample PRs** - C#: [Efficient Params and String Formatting](https://github.com/dotnet/csharplang/blob/master/proposals/format.md) - CoreFX: [https://github.com/dotnet/corefx/issues/28379](https://github.com/dotnet/corefx/issues/28379) - Non-allocating string format prototype: https://github.com/dotnet/corefxlab/pull/2595 **Goals** 1. Support intrinsic value types (int, float, etc.) 1. Support most common value types used in formatting (DateTime) 1. Have high performance 1. Balance struct size against type usage frequency 1. Facilitate "raw" removal of value type data (you want to force cast to int, fine) 1. Provide a mechanism for passing a small collection of Variants via the stack 1. Allow all types by falling back to boxing 1. Support low allocation interpolated strings **Non Goals** 1. Support all value types without boxing 2. Make it work as well on .NET Framework as it does on Core (presuming it's possible in the final design) **Nice to Have** 1. Usable on .NET Framework (currently does) **General Approach** `Variant` is a struct that contains an object pointer and a "union" struct that allows stashing of arbitrary *blittable* (i.e. `where unmanaged`) value types that are within a specific size constraint. **Sample Usage** ``` C# // Consuming method public void Foo(ReadOnlySpan data) { foreach (Variant item in data) { switch (item.Type) { case VariantType.Int32: // ... } } } // Calling method public void Bar() { var data = Variant.Create(42, true, "Wow"); Foo(data.ToSpan()); // Only needed if running on .NET Framework data.KeepAlive(); } ```` **Surface Area** ``` C# namespace System { /// /// is a wrapper that avoids boxing common value types. /// public readonly struct Variant { public readonly VariantType Type; /// /// Get the value as an object if the value is stored as an object. /// /// The value, if an object, or null. /// True if the value is actually an object. public bool TryGetValue(out object value); /// /// Get the value as the requested type if actually stored as that type. /// /// The value if stored as (T), or default. /// True if the is of the requested type. public unsafe bool TryGetValue(out T value) where T : unmanaged; // We have explicit constructors for each of the supported types for performance // and to restrict Variant to "safe" types. Allowing any struct that would fit // into the Union would expose users to issues where bad struct state could cause // hard failures like buffer overruns etc. public Variant(bool value); public Variant(byte value); public Variant(sbyte value); public Variant(short value); public Variant(ushort value); public Variant(int value); public Variant(uint value); public Variant(long value); public Variant(ulong value); public Variant(float value); public Variant(double value); public Variant(decimal value); public Variant(DateTime value); public Variant(DateTimeOffset value); public Variant(Guid value); public Variant(object value); /// /// Get the value as an object, boxing if necessary. /// public object Box(); // Idea is that you can cast to whatever supported type you want if you're explicit. // Worst case is you get default or nonsense values. public static explicit operator bool(in Variant variant); public static explicit operator byte(in Variant variant); public static explicit operator char(in Variant variant); public static explicit operator DateTime(in Variant variant); public static explicit operator DateTimeOffset(in Variant variant); public static explicit operator decimal(in Variant variant); public static explicit operator double(in Variant variant); public static explicit operator Guid(in Variant variant); public static explicit operator short(in Variant variant); public static explicit operator int(in Variant variant); public static explicit operator long(in Variant variant); public static explicit operator sbyte(in Variant variant); public static explicit operator float(in Variant variant); public static explicit operator TimeSpan(in Variant variant); public static explicit operator ushort(in Variant variant); public static explicit operator uint(in Variant variant); public static explicit operator ulong(in Variant variant); public static implicit operator Variant(bool value); public static implicit operator Variant(byte value); public static implicit operator Variant(char value); public static implicit operator Variant(DateTime value); public static implicit operator Variant(DateTimeOffset value); public static implicit operator Variant(decimal value); public static implicit operator Variant(double value); public static implicit operator Variant(Guid value); public static implicit operator Variant(short value); public static implicit operator Variant(int value); public static implicit operator Variant(long value); public static implicit operator Variant(sbyte value); public static implicit operator Variant(float value); public static implicit operator Variant(TimeSpan value); public static implicit operator Variant(ushort value); public static implicit operator Variant(uint value); public static implicit operator Variant(ulong value); // Common object types public static implicit operator Variant(string value); public static Variant Create(in Variant variant) => variant; public static Variant2 Create(in Variant first, in Variant second) => new Variant2(in first, in second); public static Variant3 Create(in Variant first, in Variant second, in Variant third) => new Variant3(in first, in second, in third); } // Here we could use values where we leverage bit flags to categorize quickly (such as integer values, floating point, etc.) public enum VariantType { Object, Byte, SByte, Char, Boolean, Int16, UInt16, Int32, UInt32, Int64, UInt64, DateTime, DateTimeOffset, TimeSpan, Single, Double, Decimal, Guid } // This is an "advanced" pattern we can use to create stack based spans of Variant. Would also create at least a Variant3. public readonly struct Variant2 { public readonly Variant First; public readonly Variant Second; public Variant2(in Variant first, in Variant second); // This is for keeping objects rooted on .NET Framework once turned into a Span (similar to GC.KeepAlive(), but avoiding boxing). [MethodImpl(MethodImplOptions.NoInlining)] public void KeepAlive(); public ReadOnlySpan ToSpan(); } } ``` **FAQ** Why "Variant"? - It does perform a function "similar" to OLE/COM Variant so the term "fits". Other name suggestions are welcome. Why isn't `Variant` a ref struct? - Primarily because you can't create a `Span` of ref structs. - We also want to give the ability to store arrays of these on the heap when needed What about variadic argument support (`__arglist`, [`ArgIterator`](https://docs.microsoft.com/en-us/dotnet/api/system.argiterator.-ctor?redirectedfrom=MSDN&view=netcore-3.0#System_ArgIterator__ctor_System_RuntimeArgumentHandle_), etc.)? - Short answer: not sufficient. Referred to as "Vararg" in the CLI specification, the current implemenation is primarily for C++/CLI. It isn't supported on Core yet and would require significant investment to support scenarios here reliably and to support non-Windows environments. This would put any solution based on this way out and may make down level support impossible. What about [`TypedReference`](https://docs.microsoft.com/en-us/dotnet/api/system.typedreference?view=netcore-2.2) and `__makeref`, etc.? - `TypedReference` is a ref struct (see above). `Variant` gives us more implementation flexibility, doesn't rely on undocumented keywords, and is actually faster. (Simple [test](https://gist.github.com/JeremyKuhne/0c68e3dcefa2273b3d2817c43b812ee8) of wrapping/unwrapping an int it is roughly 10-12% faster depending on inlining.) Why not support anything that fits? - We could in theory, but there would be safety concerns with getting the data back out. To support high performance usage we want to allow hard casts of value data. How about enums? - This one may be worth it and is technically doable. Still investigating... cc: @jaredpar, @vancem, @danmosemsft, @jkotas, @davidwrighton, @stephentoub
benaadams commented 5 years ago

Would this be a 16 byte (Guid/Decimal) + enum sized struct? (24 bytes with padding on x64)

jkotas commented 5 years ago

TypedReference is a ref struct (see above). Variant gives us more implementation flexibility, doesn't rely on undocumented keywords, and is actually faster.

These all can be fixed, without too much work. TypedReference has been neglected, but that does not mean it is a useless type. (Some of this is described in https://github.com/dotnet/corefx/issues/29736.)

I think fixing TypedReference would be a better choice than introducing a new Variant type, if everything else is equal.

Allow all types by falling back to boxing

I think the design should allow all types without falling back to boxing.

Work on .NET Framework

This should be a non-goal. It is fine if the winning design that we pick happens to work on .NET Framework, but trying to make it work on .NET Framework should be an explicit non-goal. We have made a contious design to not restrict our design choices to what works on .NET Framework.

JeremyKuhne commented 5 years ago

Would this be a 16 byte (Guid/Decimal) + enum sized struct? (24 bytes with padding on x64)

Goal is 24 bytes. We've looked at a lot of different ways of packing that in. A pointer and 16 bytes of data. It might involve some contortions or dropping down to 12 bytes of data.

but that does not mean it is a useless type.

Not trying to infer it is useless, just not appropriate in this case. I'm not sure how you'd make it a non-ref struct or make as fast as something targeted at key types.

This should be a non-goal.

Fair enough, I've changed it to nice-to-have. There are, however, real business needs for mitigating formatting inefficiencies on .NET Framework.

I think the design should allow all types without falling back to boxing.

I think we should have some design that does this but I don't think we can provide a solution that solves everything for all scenarios well. Having multiple approaches doesn't seem like a terrible thing to me, particularly given that we could make this sort of solution available much much sooner than full varargs support.

stephentoub commented 5 years ago

I think we should have some design that does this but I don't think we can provide a solution that solves everything for all scenarios well. Having multiple approaches doesn't seem like a terrible thing to me, particularly given that we could make this sort of solution available much much sooner than full varargs support.

FWIW, this approach feels very limited to me, in that I see supporting every value type as a key scenario. I would rather see, for example, a simple unsafe annotation/attribute that would let the API tell the JIT that it promises wholeheartedly an argument won't escape, and then add an overload that takes a [UnsafeWontEscape] params ReadOnlySpan<object> args, where the JIT would stack-allocate the boxes for any value types provided. Just an example.

JeremyKuhne commented 5 years ago

FWIW, this approach feels very limited to me, in that I see supporting every value type as a key scenario. I would rather see, for example, a simple unsafe annotation/attribute that would let the API tell the JIT that it promises wholeheartedly an argument won't escape, and then add an overload that takes a [UnsafeWontEscape] params ReadOnlySpan<object> args, where the JIT would stack-allocate the boxes for any value types provided. Just an example.

To be super clear, I don't see this as a solves-all-boxing solution. I absolutely think we can benefit from broader approaches, but I have a concern about being efficient with core types. Being able to quickly tell that you have an int and extract it is super valuable I think. Certainly for the String.Format case, for example. :)

jkotas commented 5 years ago

Being able to quickly tell that you have an int and extract it is super valuable I think.

Depends on how the actual formatting is implemented. If you can dispatch a virtual formatting method, ability to switch over a primitive type does not seem super valuable.

[UnsafeWontEscape] params ReadOnlySpan<object> args

Something like this would work too. It is pretty similar to ReadOnlySpan<TypedReference> on the surface, with different tradeoffs and low-level building blocks.

stephentoub commented 5 years ago

It is pretty similar to ReadOnlySpan on the surface, with different tradeoffs and low-level building blocks.

I'd be fine with that as well if it was similarly seamless to a caller.

jaredpar commented 5 years ago

Rather than an attribute and a promise I'd like to leverage the type system if possible here. 😄

What if instead we added a JIT intrinsic that "boxes" value types into a ref struct named Boxed. This type would have just enough information to allow manipulation of the boxed value:

ref struct Boxed {
  Type GetBoxedType();
  T GetBoxedValue<T>();
}

The JIT could choose to make this a heap or stack allocation depending on the scenario. The important part is that it would move the boxing operation into a type whose lifetime we need to carefully monitor. The compiler will do it for us.

That doesn't completely solve the problem because you can't have ReadOnlySpan<Boxed> as a ref struct can't be a generic argument. That's not because of a fundamental limitation of the type system but more because we didn't have a motivating scenario. Suppose this scenario was enough and we went through the work in C# to allow it. Then we could have the signature of the method be params ReadOnlySpan<Boxed>. No promises needed here, the compiler will be happy to make developers do the right thing 😉

stephentoub commented 5 years ago

That also sounds reasonable.

(Though the [UnsafeWontEscape] approach could also work on the existing APIs: we just attribute the existing object arguments in the existing methods, and apps just get better.)

jkotas commented 5 years ago

How would struct Boxed differ from existing TypedReference (with extra methods added to make it useful)?

Either way, it sounds reasonable too.

vancem commented 5 years ago

I do think that if our goal is just to solve the parameter passing problem, something based on references (which can work uniformly on all types) is worth thinking about (this is Jan's TypedReference approach).

However that does leave out the ability to have something that can represent anything (but all primitives efficiently (without extra allocation)) that you can put into objects (which is what Variant is).

I think the fact that we don't have a standard 'Variant' type in the framework is rather unfortunate. Ultimately it is an 'obvious' type to have in the system (even if ultimately you solve the parameter passing issue with some magic stack allocated array of types references).

I also am concernd that we are solving a 'simple' problem (passing prameters) with a more complex one (tricky refernece based classes whose safety is at best subtle).

I think we should have a Variant class, it is straightforward, and does solve some immediate problems without having to design a rather advanced feature (that probably would not make V3.0.

For what it is worth...

jkotas commented 5 years ago

we don't have a standard 'Variant' type in the framework is rather unfortunate.

I agree with that and the Variant proposal would look reasonable to me if the Variant was optimized for primitive types only. The proposal makes it optimized for primitive types and set of value types that we think are important for logging today. It does not feel like a design that will survive over time. I suspect that there will be need to optimize more types, but it won't be possible to extend the design to fit them.

vancem commented 5 years ago

Note that generally speaking, a Variant is a chunk of memory that holds things in-line and a pointer that allows you to hold 'anything'.

Semantically it is always the case that a variant can hold 'anything', so that is nice in that the there is not a 'sematic' cliff, only a performance cliff (thus as long as the new types that we might want to add in the future are not perf critical things are OK. I note that the list that really are perf-critical are pretty small and likely to not change over time (int, string, second tier are long, and maybe DateTime(Offset)). So I don't think we are taking a huge risk there.

And there are things you can do 'after the fact' Lets assume we only alotted 16 bytes for in-line data but we wanted something bigger. If there is any 'skew' to the values (this would for most types, but not for random number generated IDs), you could at least store the 'likely' values inline and box the rest. It would probably be OK, and frankly it really is probably the right tradeoff (it would be surprising to me that a new type in the future so dominated the perf landscape over existing types that it was the right call to make the struct bigger to allow it to be stored inline). That has NEVER happened so far.

Indeed from a cost-benefit point of view, we really should be skewing things to the int and string case becasue these are so much more likely to dominate hot paths. We certainly don't want this to be bigger than 3 pointers, and it would be nice to get it down to 2 (but that does require heroics for any 8 byte sized things (long, double, datetime ...), so I think we are probably doing 3.

But it does feel like a 'stable' design (5 years from now we would not feel like we made a mistake), sure bugger types will be slow, but I don't think would want to make the type bigger even if we could. It would be the wrong tradeoff.

So, I think Variant does have a reasonablys table design point, that can stand the test of time.

From my point of view, I would prefer that the implementation be tuned for overwhelmingly likely case of int an string). My ideal implementation would be a 8 bytes of inline-data / discriminator, and 1 object pointer. This is a pro

stephentoub commented 5 years ago

One of the main use cases this is being proposed for is around string interpolation and string formatting.

I realize there are other uses cases, so not necessarily instead of a something Variant-like, but specifically to address the case of string interpolation, I had another thought on an approach….

Today, you can define a method like:

AppendFormat(FormattableString s);

and use that as the target of string interpolation, e.g.

AppendFormat($”My type is {GetType()}.  My value is {_value:x}.”);

Imagine we had a pattern (or an interface, though that adds challenge for ref structs) the compiler could recognize where a type could expose a method of the form:

AppendFormat(object value, ReadOnlySpan<char> format);

The type could expose additional overloads as well, and the compiler would use normal overload resolution when determining which method to call, but the above would be sufficient to allow string interpolation to be used with the type in the new way. We could add this method to StringBuilder, for example, along with additional overloads for efficiency, e.g.

public class StringBuilder
{
    public void AppendFormat(object value, ReadOnlySpan<char> format);
    public void AppendFormat(int value, ReadOnlySpan<char> format);
    public void AppendFormat(long value, ReadOnlySpan<char> format);
    public void AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format);
    … // etc.
}

We could also define new types (as could anyone), as long as they implemented this pattern, e.g.

public ref struct ValueStringBuilder
{
    public ValueStringBuilder(Span<char> initialBuffer);

    public void AppendFormat(FormattableString s);
    public void AppendFormat(object value, ReadOnlySpan<char> format);
    public void AppendFormat(int value, ReadOnlySpan<char> format);
    public void Appendformat(long value, ReadOnlySpan<char> format);
    public void AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format);
    … // etc.

    public Span<char> Value { get; }
}

Now, when you call:

ValueStringBuilder vsb = …;
vsb.AppendFormat($”My type is {GetType()}.  My value is {_value:x}.”);

rather than generating what it would generate today if this took a FormattableString:

vsb.AppendFormat(FormattableStringFactory.Create("My type is {0}. My value is {1:x}.”, new object[] { GetType(), (object)_value }));

or if it took a string:

vsb.AppendFormat(string.Format("My type is {0}. My value is {1:x}.”, GetType(), (object)_value));

it would instead generate:

vsb.AppendFormat(“My type is “, default);
vsb.AppendFormat(GetType(), default);
vsb.AppendFormat(“. My value is “, default);
vsb.AppendFormat(_value, “x”);
vsb.AppendFormat(".", default);

There are more calls here, but most of the parsing is done at compile time rather than at run time, and a type can expose overloads to allow any type T to avoid boxing, including one that takes a generic T if so desired.

benaadams commented 5 years ago

If you throw out Guid and Decimal (as they are 16 bytes); then you could use the object pointer as the discriminator; rather than enum.

e.g.

public readonly struct Variant : IFormattable
{
    private readonly IntPtr _data;
    private readonly object _typeOrData; 

    public unsafe bool TryGetValue<T>(out T value) where T : IFormattable
    {
        if (typeof(T) == typeof(int))
        {
            if ((object)typeof(T) == _typeOrData)
            {
                value = Unsafe.As<IntPtr, int>(in _data);
            }

            value = default;
            return false;
        }
        // etc.
    }

    public override string ToString()
    {
        return ToString(null, null);
    }

    public string ToString(string format, IFormatProvider formatProvider)
    {
        if ((object)typeof(int) == _typeOrData)
        {
            return Unsafe.As<IntPtr, int>(in _data).ToString(format, formatProvider);
        }
        // etc.
    }
}

And box others to _typeOrData, not ideal though

vancem commented 5 years ago

@benaadams - Generally I like the kind of approach you are suggesting.

In my ideal world, Variant would be a object reference and an 8 bytes for buffer. It should be super-fast on int and string, and non-allocating on data types 8 bytes or smaller (by using the object as a discriminator for 8 byte types). For Datatypes larger than 8 bytes, either box, or you encode the common values into 8 bytes or less, and box the uncommon values.

This has the effect of skewing the perf toward the overwhelmingly common cases of int and string (and they don't pay too much extra bloat for the rarer cases).

JeremyKuhne commented 5 years ago

@stephentoub Generally speaking I like the idea of moving parsing to compile time. I'll play around to see what sort of perf implications it has.

One thing I'd want to make sure we have an answer for is how do we fit ValueFormatableString (or something similar) into this picture? Ideally we can add just one overload to Console.WriteLine() that will magically suck $"" away from Console.WriteLine(string). Could we leverage ValueStringBuilder for this?

int count = 42;
Console.WriteLine($"The count is {count}.");

// And we have the following overload
void WriteLine(in ValueStringBuilder builder);

// Then C# generates:
ValueStringBuilder vsb = new ValueStringBuilder();
// ... the series of Appends() ...
WriteLine(vsb);
vsb.Dispose(); // Note that this isn't critical, it just returns any rented space to the ArrayPool

We could also add overloads that take IFormatProvider, ValueStringBuilder? Or possibly just add an optional IFormatProvider on ValueStringBuilder? Then something like this could happen:

Console.WriteLine(myFormatProvider, $"The count is {count}.");

// Creates the following
ValueStringBuilder vsb = new ValueStringBuilder(myFormatProvider);
// ... the series of Appends() ...
WriteLine(vsb);
vsb.Dispose();
JeremyKuhne commented 5 years ago

@benaadams, @vancem

If you throw out Guid and Decimal (as they are 16 bytes); then you could use the object pointer as the discriminator; rather than enum.

Pulling DateTimeOffset along for the ride is kind of important if we support DateTime as it is now the preferred replacement. That pushes over 8. The way I would squish that and Guid/Decimal in 24 bytes is to use sentinel objects for Guid/Decimal and squeeze a 4 byte enum in the "union". (Which is the same sort of thing @Vance is talking about, but with a bigger bit bucket.) Ultimately we're stuck with some factor of 8 due to the struct packing, if we dial to 16 (the absolute smallest), it would require making 8 byte items slow and putting anything larger into a box.

It would be cool if we could borrow bits from the object pointer (much like an ATOM is used in Win32 APIs), but that obviously would require runtime support.

jkotas commented 5 years ago

It is pretty common to pass around strings as Span<char> in modern high performance C#. It would be really nice if the high-performance formatting supported consuming Span<char> items.

stephentoub commented 5 years ago

It would be really nice if the high-performance formatting supported consuming Span items.

This is one of the advantages I see to the aforementioned AppendFormat approach. In theory you just have another AppendFormat(ReadOnlySpan<char> value, ReadOnlySpan<char> format) overload, and then you could do $"This contains a {string.AsSpan(3, 7)}" and have that "just work".

jaredpar commented 5 years ago

@stephentoub

This is one of the advantages I see to the aforementioned AppendFormat approach.

Indeed. In the AppendFormat approach the compiler would simply translate every value in the interpolated string to valueFormattableStringBuilder.AppendFormat(theValue) and then bind the expression exactly as it would be bound if typed out. That means you can add specialized overloads like AppendFormat(ReadOnlySpan<char>) now or years down the road and the compiler would just pick them up.

JeremyKuhne commented 5 years ago

I'm going to break out a separate proposal for "interpolated string -> Append sequence" and do a bit of prototyping to examine the performance.

stephentoub commented 5 years ago

@JeremyKuhne, I opened https://github.com/dotnet/corefx/issues/35986.

MgSam commented 5 years ago

Just to add my 2 cents here- storing heterogeneous data whose types are not known are compile time has a lot more uses than just string interpolation. Take our old friend DataTable for example, to this day it remains the only way in the BCL to hold dynamic tabular data (until and unless a modern DataFrame type is ever added). And 100% of everything that you put in a DataTable is boxed.

Having a true Variant type could bring great performance benefits in such a scenario.

I'd even say its a far more important scenario than string interpolation. Most metrics have shown the popularity of Python exploding to one of the most-used languages in the last few years. And the reason is because of the great libraries it has for working with data. The market is clearly saying it wants better and more efficient ways of working with data and .NET should oblige.

JeremyKuhne commented 5 years ago

@MgSam do you think avoiding boxing on common types is good enough? The initial proposal doesn't handle everything, but allows putting data on the heap (e.g. creating Variant[]). There are ways to create references to anything already (__makeref() and TypedReference), but:

Stashing arbitrary struct data in Variant isn't safe, so we have to restrict it to types that are known to have no ill side effects if their backing fields have random data. We're also constrained by the size of what we can stash.

MgSam commented 5 years ago

Yes, I think common types likely cover 95% of the use cases. You don't often have nested objects when working with large tables of data.

sakno commented 3 years ago

I would like to propose different approach without introducing a new type in BCL. Somewhere in this repo I saw the proposal introducing ValueFormattableString type which is value type equivalent of FormattableString class. Let's start from it. In .NET we actually already have stack-based representation of values of different types. This is a family of value tuple types. So, ValueFormattableString can be created as generic value type:

public readonly struct ValueFormattableString<TArgs>
  where TArgs : struct, ITuple
{
  public ValueFormattableString(string format, TArgs args);
}

With tuples, we can avoid boxing of arbitrary value type passed for formatting. Generally, we have two situations here:

The second case is rare and can be handled easily but without optimizations:

string[] formattingArgs = new string[args.Length];
for (int i = 0; i < args.Length; i++)
{
  object item = args[i];
  formattingArgs[i] = item is IFormattable ? item.ToString(null); item.ToString();
}

Assume that the using tuple types (first case) is more common way to represent the arguments for the formattable string (moreover, this way can be natively supported by C# compiler). Now we need to solve the problem with converting individual tuple element to the string without unnecessary allocations.

Proposal # 1: Introduce JIT intrinsic method like this:

internal static string TupleItemToString<T>(in T tuple, int index, IFormatProvider? provider) where T : struct, ITuple;

JIT can easily replace this method with pure IL implementation for each generic argument T represented by the tuple type. The following example demonstrates transformation of this method for ValueTuple<int, object> tuple type:

internal static string TupleItemToString(in ValueTuple<int, object> tuple, int index, IFormatProvider? provider) => index switch
{
  0 => tuple.Item1.ToString(provider), // because type Int32 implements IFormattable interface
  1 => tuple.Item2.ToString(), // type Object doesn't implement IFormattable interface
  _ => throw new ArgumentOutOfRangeException(nameof(index))
};

Proposal # 2: It requires #26186. Each field can be extracted using TypedReference without boxing and converted to string.

Proposal # 3: Introduce IFormattingArgumentsSupplier public interface in BCL:

public interface IFormattingArgumentsSupplier
{
  string ToString(int index, IFormatProvider? provider = null);

  int Length { get; }
}

Now this interface can be implemented explicitly by each value tuple type. Also, we need to replace ITuple with this interface in constraints:

public readonly struct ValueFormattableString<TArgs>
  where TArgs : struct, IFormattingArgumentsSupplier
{
  public ValueFormattableString(string format, TArgs args);
}

The implementation of such formattable string is trivial because it's possible to obtain string representation of individual tuple item without boxing. Additionally, with such interface string.Format method can overloaded to avoid heap allocations:

public sealed class String
{
  public static string Format(IFormatProvider provider, string format, TArgs args)
    where TArgs : struct, IFormattingArgumentsSupplier;

  public static string Format(string format, TArgs args)
    where TArgs : struct, IFormattingArgumentsSupplier;
}

Usage of this method is very from C# because of native support of tuple types:

string result = string.Format("{0} + {1} = {2}", (40, 2, 42));
KrzysztofCwalina commented 3 years ago

We should try to fit this in 16 bytes, and I think it's possible. We could store UTC date times as ticks, and box non-UTC date times, i.e.

        public Variant(DateTimeOffset value) {
            if (value.Offset.Ticks == 0) {
                _i64 = value.Ticks;
                _obj = typeof(DateTimeOffset);
            }
            else {
                _obj = value;
                _i64 = 0;
            }
        }

        public static implicit operator Variant(DateTimeOffset value) {
            return new Variant(value);
        }

        public static explicit operator DateTimeOffset(Variant variant) {
            if (variant._obj.Equals(typeof(DateTimeOffset))) {
                return new DateTimeOffset(variant._i64, TimeSpan.Zero);
            }
            if (variant._obj is DateTimeOffset dto){
                return dto;
            }

            throw new InvalidCastException();
        }
KrzysztofCwalina commented 3 years ago

Why "Variant"? It does perform a function "similar" to OLE/COM Variant so the term "fits". Other name suggestions are welcome.

Value?

rubenprins commented 3 years ago

@KrzysztofCwalina If you want to go the extra mile, it's possible to store a DateTimeOffsets with regular offsets in a UInt64, when you do some selective squeezing of the data.

For example, a Variant implementation I've worked on supports not boxing of DateTimeOffsets if the offset is a multiple of 15 minutes, and the DateTime lies between 1900 and 2160 (that is, store the -14...14 offset in 15 minute increments, and use the remaining space in UInt64 to store the ticks Ticks since 1900-01-01). That trick will basically store pretty much any regular DateTimeOffset inline.

Similarly for Decimal, which is stored rather sparsely: if Decimal's high is 0, its mid does not contain 1s above bit 26, you can squeeze the Decimal data with an effective range of -288230376151711743 to +288230376151711743, with its original scale. Again, that accounts for most Decimals used in our systems. You can even handle MinValue and MaxValue by observing that the scale of a Decimal is a value between 0 and 28, and abuse that – we took Int64.MinValue for Decimal.MinValue and Int64.MaxValue for Decimal.MaxValue, as these values set scale bits that are not allowed. You could even squeeze more out if you didn't preserve the original scale.

KrzysztofCwalina commented 3 years ago

@rubenprins, true. Though the more tricks like that we play, the slower the code will be, and so we will need to measure and think hard if it's worth it.

JeremyKuhne commented 3 years ago

I had put this on the back burner as we had gone a different way with string formatting than my original proposal. With ASP (@davidfowl) and Azure (@KrzysztofCwalina) expressing interest I've reworked this as a smaller type. Here is the updated API proposal (which is more general and takes some of the feedback above into account):

public readonly struct Value
{
    public Value(object? value);
    public Value(byte value);
    public Value(byte? value);
    public Value(sbyte value);
    public Value(sbyte? value);
    public Value(bool value);
    public Value(bool? value);
    public Value(char value);
    public Value(char? value);
    public Value(short value);
    public Value(short? value);
    public Value(int value);
    public Value(int? value);
    public Value(long value);
    public Value(long? value);
    public Value(ushort value);
    public Value(ushort? value);
    public Value(uint value);
    public Value(uint? value);
    public Value(ulong value);
    public Value(ulong? value);
    public Value(float value);
    public Value(float? value);
    public Value(double value);
    public Value(double? value);
    public Value(DateTimeOffset value);         // Boxes with offsets that don't fall between 1800 and 2250
    public Value(DateTimeOffset? value);        // Boxes with offsets that don't fall between 1800 and 2250
    public Value(DateTime value);
    public Value(DateTime? value);
    public Value(ArraySegment<byte> segment);
    public Value(ArraySegment<char> segment);
    // No decimal as it always boxes

    public Type? Type { get; }                      // Type or null if the Value represents null
    public static Value Create<T>(T value);
    public unsafe bool TryGetValue<T>(out T value); // Fastest extraction
    public T As<T>();                               // Throws InvalidCastException if not supported

    // For each type of constructor except `object`:
    public static implicit operator Value(int value) => new(value);
    public static explicit operator int(in Value value) => value.As<int>();
}

I have a working prototype here.

Design considerations:

This tries to find a balance between speed and space. Putting an intrinsic in and getting it out takes less than a nanosecond for me. Most operations are a few to several nanoseconds. Some go to about 20ns. (Note that the prototype doesn't have any internal runtime access.)

Differences from the original proposal:

Value is just another naming suggestion. Any clever terms are welcome. :)

cc: @tarekgh

sakno commented 3 years ago

@JeremyKuhne , @KrzysztofCwalina , I'm curious why the approach proposed in this comment was not accepted? It allows to avoid boxing for arbitrary value type through placing all arguments for formatting on the stack using tuple container. Or I missed something?

davidfowl commented 3 years ago

We're not mainly trying to solve string formatting anymore.

sakno commented 3 years ago

Ah okay, thanks for explanation, @davidfowl !

jkotas commented 3 years ago

So what we are trying to solve with this proposal if it is not primarily for string formatting anymore?

I have mention to Jeremy offline that the list of the types that this handles looks somewhat arbitrary and whether it works well or not depends a lot on the scenario.

JeremyKuhne commented 3 years ago

So what we are trying to solve with this proposal if it is not primarily for string formatting anymore?

This is a reduced boxing support type. It doesn't solve everything, but with broad coverage of fundamental types and the ability to store data in any other type it is useful. The Azure SDK is going to be using this type internally and ASP.Net is looking at this for logging. Think APIs / Types that take object that their primary data ends up being intrinsics or DateTimeOffsets.

I have mention to Jeremy offline that the list of the types that this handles looks somewhat arbitrary and whether it works well or not depends a lot on the scenario.

I don't think this is arbitrary at all. They're only "fundamental" types in my view. It is all intrinsics outside of decimal. It handles all DateTime and the vast majority of real-world DateTimeOffset. DateTime is pretty important and is the only one that has a TypeCode for that isn't a ref type or intrinsic. Segment<byte> and Segment<char> allow representing character data (UTF-8, UTF-16) which still fits in this circle. Enum handling also fits as they're intrinsics with a set of values.

I do agree that the crispness of definition is important so we shouldn't be pulling in random other things such as Point. One can leverage the enum support if one controls both sides to "smuggle" other value types (say with enum PointData : ulong or enum PackedDataType as a sentinel). With this sort of workaround I think it is easier to keep the line on the current definition.

jkotas commented 3 years ago

The Azure SDK is going to be using this type internally and ASP.Net is looking at this for logging

The proposal needs to have detail on these use cases. Are there going to be any public ASP.NET APIs that consume this type?

I don't think this is arbitrary at all. They're only "fundamental" types in my view.

The set of fundamental types depends on scenario. For example, we have a similar union in this repo here: https://github.com/dotnet/runtime/blob/01b7e73cd378145264a7cb7a09365b41ed42b240/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/TraceLogging/PropertyValue.cs#L29-L74 . It special cases different set of types.

benaadams commented 3 years ago

Are these two examples kind of struct disciminated union? (Or one with named alas) https://github.com/dotnet/csharplang/issues/113 🤔

agocke commented 3 years ago

This feels a lot like discriminated unions and I wonder if we could build the general purpose feature which allows for any type, including managed, and then have a mechanism for the compiler and runtime to work together to efficiently store constructions which happen to be unmanaged.

davidfowl commented 3 years ago

I think the type needs to flow without being viral. I like both TypedReference approach for values that can't escape the heap and the variant approach for thing that do. I can also see a type like this being super useful for fast reflection and serialization. Today, generics are too viral and don't work for framework code and TypedReference is ref only and not usable in many scenarios where I'd want to use this. For just as an example, I've been looking at a way to do fast reflection for ages (not boxing the arguments and supporting Span). A version of this type that's supported any T would be ideal but I don't know what that would look like or if it would even be possible without runtime support (like span).

The other use case is logging without boxing. I'd like to allow callers to preserve primitive types without boxing and allow the logger provider to unwrap and serialize.

sakno commented 3 years ago

In my proposal I offered to use tuple as a container. Tuples are normal structs, not ref-like structs. Tuple can represent arbitrary number of arguments of any type (except Span<T> and ROS<T>). The only thing that must be provided by the runtime is a special intrinsic:

internal static string TupleItemToString<T>(in T tuple, int index, IFormatProvider? provider) where T : struct, ITuple;

It can be converted to more low-level version to be compatible with other scenarios like usage of IBufferWriter<char>:

internal static bool TupleItemToString<T>(in T tuple, int index, Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) where T : struct, ITuple;

Tuple can be passed to logging or Console method with the string containing template for formatting. From my point of view, this approach is reusable everywhere when you have a set of formatting arguments and the template. Interpolated string is also covered.

In reality, implementation of TupleItemToString method can be done without intrinsic:

In case of JIT intrinsic, TupleItemToString can be replaced with pure IL without reflection.

Public API can be look like this:

public sealed class String
{
  public static string Format(IFormatProvider? provider, string format, in TArgs args)
    where TArgs : struct, System.Runtime.CompilerServices.ITuple;

  public static void Format(IFormatProvider? provider, string format, in TArgs args, IBufferWriter<char> output)
    where TArgs : struct, System.Runtime.CompilerServices.ITuple;
}
davidfowl commented 3 years ago

This has all the problems I stated above about generic code. The T needs to flow everywhere and that's what Variant/Value and TypedReference solve that the ITuple solution does not

agocke commented 3 years ago

@davidfowl

A version of this type that's supported any T would be ideal but I don't know what that would look like or if it would even be possible without runtime support (like span)

Is this in response to my proposal? Discriminated unions are a feature for declaring types. My point is that you wouldn't build a single type to handle all possible use cases, each use case would declare a suitable type, and if that type happens to be purely unmanaged then the compiler would codegen it differently.

davidfowl commented 3 years ago

Is this in response to my proposal? Discriminated unions are a feature for declaring types. My point is that you wouldn't build a single type to handle all possible use cases, each use case would declare a suitable type, and if that type happens to be purely unmanaged then the compiler would codegen it differently.

I'm familiar with the DU proposal but I don't it's suitable for the same things Variant/Value will be used for.

agocke commented 3 years ago

I'd be interested in an example that you think couldn't be represented with discriminated unions. DUs seem to me a strict increase in expressive power.

davidfowl commented 3 years ago

Here's a canonical example from logging:

https://github.com/dotnet/runtime/blob/721384062e9d812f2e12639319efb0d5c424da63/src/libraries/Microsoft.Extensions.Logging.Abstractions/src/LoggerMessage.cs#L376-L397

We need to flow these generics to avoid boxing through the method, through the return value, then we need to make a generic LogValues object with the same number of generic arguments. Then when these objects get logged, we need the consumer to be able to unpack them from non-generic code, so we end up boxing everything through the IReadOnlyList<KeyValuePair<string, object>> interface. Ideally I would be able to preserve this type information without needing to have all generic code (IReadOnlyList<KeyValuePair<string, Value>>).

This also happens for the reflection APIs where you want to pass a variable sized Span to invoke a method. Here's an example of how reflection could use these APIs:

class MethodInfo
{
    public Value InvokeFast(object instance, Span<Value> args);
}

I really want a way to round trip a T/T[]/Span<T> without forcing all of the code to be generic and without boxing. This is the super power that Value gives me for primitive types and reference types. Custom value types don't get the benefit obviously but I assume we'd need something else for that.

The key issue is that framework code that's shuttling these types around doesn't want to force generics everywhere and it's even more complex when you have multiple generic arguments (like a Span).

Maybe what I want is associated types.

jkotas commented 3 years ago

I really want a way to round trip a T/T[]/Span without forcing all of the code to be generic and without boxing. This is the super power that Value gives me for primitive types and reference types.

The Value type proposed here won't give you this super power. It won't work for Span. TypedReference has this superpower, and that is why the plan we are on with reflection is based on TypedReference.

davidfowl commented 3 years ago

The Value type proposed here won't give you this super power. It won't work for Span. TypedReference has this superpower, and that is why the plan we are on with reflection is based on TypedReference.

I know it won't work with Span but the ref struct restrictions are too much for many scenarios. So I think we need TypedReference and Value (like Span and Memory).

jkotas commented 3 years ago

So I think we need TypedReference and Value (like Span and Memory).

We have that already: TypedReference and object.

This proposal is about creating an alternative storage for object-like value that is more efficient for some set of types and less efficient for the rest. You can imagine to only use it as an internal implementation detail when you need to store the data on the heap. If it is an internal implementation detail, it does not need to be a public type and the set of the more efficient types can be tailored for each use case, and it is where the discriminated unions would be useful.