Open VladimirReshetnikov opened 4 years ago
Using ITuple
in the code loses the advantage of value tuples.
Using it in generic constraint increases code complexity:
namespace System
{
public static class TupleExtensions
{
public static (T1, T2) ToValueTuple<T1, T2>(this T value) where T : ITuple<T1, T2> =>
value is null
? throw new ArgumentNullException(nameof(value))
: (value.Item1, value.Item2);
public static Tuple<T1, T2> ToTuple<T1, T2>(this T value) where T : ITuple<T1, T2> =>
value is null
? throw new ArgumentNullException(nameof(value))
: value as Tuple<T1, T2> ?? Tuple.Create(value.Item1, value.Item2);
public static void Deconstruct<T1, T2>(this T value, out T1 item1, out T2 item2) where T : ITuple<T1, T2>
{
if (value is null)
throw new ArgumentNullException(nameof(value));
item1 = value.Item1;
item2 = value.Item2;
}
// other existing members...
}
}
I think it's better to consider with advanced FP concepts like type classes, to avoid the complexity.
I have a situation where having ITuple<T0,T1,T2...>
is necessary to avoid boxing: for performing operations over IEnumerable<ValueTuple<...>>
such that they preserve C# value-tuple member names.
Supposing I want to do a Linq Sum
for multiple members of T
in an IEnumerable<T>
in a single-loop iteration - so I did this:
public static ValueTuple<Int32,Int32,Int32,Int32> MultiSum( this IEnumerable<ValueTuple<Int32,Int32,Int32,Int32>> source )
{
if( source is null ) throw new ArgumentNullException( nameof( source ) );
Int32 sumA = 0;
Int32 sumB = 0;
Int32 sumC = 0;
Int32 sumD = 0;
foreach( ( Int32 a, Int32 b, Int32 c, Int32 d ) in source )
{
checked
{
sumA += a;
sumB += b;
sumC += c;
sumD += d;
}
}
return ( sumA, sumB, sumC, sumD );
}
This approach does not preserve value tuple member names: so if you have a call-site like this:
var list = new[]
{
( a1: 1, b2: 4, c3: 7 ),
( a1: 2, b2: 5, c3: 8 ),
( a1: 3, b2: 6, c3: 9 )
};
var sum = list.MultiSum(); // sum == ( Item1: 6, Item2: 15, Item3: 24 )
...then the inferred type of var sum
is ( Int32 Item1, Int32 Item2, Int32 Item3 )
instead of ( Int32 a1, Int32 b2, Int32 c3 )
.
It seems the only way for Roslyn/C# to preserve member names when a method ostensibly returns the same ValueTuple type is by writing generic code for TTuple : struct
- however this is imperfect because there is no interface for ValueTuple
types and you cannot constrain a generic type a specific value-type either, so the best I can do is where TTuple : struct, IEquatable<ValueTuple<Int32,Int32,Int32,Int32>
.
static TTuple MultiSum_but_this_method_boxes<TTuple>( this IEnumerable<TTuple> source )
where TTuple : struct, IEquatable<(Int32,Int32)>
{
Int32 sumA = 0;
Int32 sumB = 0;
foreach( TTuple t in source )
{
if( t is ValueTuple<Int32,Int32> vt )
{
checked
{
sumA += vt.Item1;
sumB += vt.Item2;
}
}
}
ValueTuple<Int32,Int32> ret = ( sumA, sumB );
if( ret is TTuple hmm )
{
return hmm;
}
else
{
throw new InvalidOperationException( "This should never happen." );
}
}
And while this code below ostensibly works, if you examine the IL code the if( tuple is ValueTuple<Int32,Int32,Int32,Int32> vt )
check is compiled to a box TTuple; isinst ValueTuple<Int32,Int32,Int32,Int32>
instruction, which is very suboptimal.
So far the best I can do is this which requires the call-site to supply a trivial Func
to correctly convert TTuple
to ValueTuple<...>
without boxing, however boxing the return value is currently unavoidable (using another trivial Func
to convert the return-value causes C# to drop the ValueTuple member-names for reasons I haven't investigated yet):
public static TTuple MultiSum<TTuple>( this IEnumerable<TTuple> source, Func<TTuple,ValueTuple<Int32,Int32>> trivial )
where TTuple : struct, IEquatable<(Int32,Int32)>
{
ValueTuple<Int32,Int32> sum = MultiSumInner( source.Select( trivial1 ) );
return (TTuple )(Object)sum; // <-- Boxing, ew.
}
private static ValueTuple<Int32,Int32> MultiSumInner( IEnumerable<ValueTuple<Int32,Int32>> source )
{
Int32 sumA = 0;
Int32 sumB = 0;
foreach( ValueTuple<Int32,Int32> t in source )
{
checked
{
sumA += t.Item1;
sumB += t.Item2;
}
}
return ( sumA, sumB );
}
(I have a T4 script that generates MultiSum
and MultiSumInner
for ValueTuple<Int32,Int32>
through ValueTuple<Int32,Int32,Int32,Int32,Int32,Int32,Int32,Int32>
).
Update: I was able to hack together a generic value-type conversion function using IL Emit:
internal static class UnsafeValueCasts<TFrom,TTo>
where TFrom : struct
where TTo : struct
{
private static readonly Func<TFrom,TTo> _convertFunc = GenericsIL.CreateValueCast<TFrom,TTo>();
public static TTo Convert( TFrom source )
{
return _convertFunc( source );
}
}
internal static class GenericsIL
{
public static Func<TFrom,TTo> CreateValueCast<TFrom,TTo>()
where TFrom : struct
where TTo : struct
{
// TODO: Validate that TFrom and TTo are compatible value-types (i.e. have the exact same layout).
Type someType = typeof(ValueCasts<TFrom,TTo>);
DynamicMethod method = new DynamicMethod(
name : typeof(TFrom).Name + "_to_" + typeof(TTo).Name,
returnType : typeof(TTo),
parameterTypes: new[] { typeof(TFrom) },
m : someType.Module,
skipVisibility: true
);
ILGenerator ilGen = method.GetILGenerator();
ilGen.Emit( OpCodes.Ldarg_0 ); // Linqpad uses `ldarg.s 00` for some reason.
ilGen.Emit( OpCodes.Ret ); // That simple, huh?
Delegate d = method.CreateDelegate( typeof(Func<TFrom,TTo>) );
Func<TFrom,TTo> fd = (Func<TFrom,TTo>)d;
return fd;
}
}
With the boxing-free MultiSum
wrapper now:
public static TTuple MultiSum2<TTuple>( this IEnumerable<TTuple> source )
#if NET48_OR_GREATER || NETCOREAPP3_1_OR_GREATER
where TTuple : struct, IEquatable<(Int32,Int32)>, ITuple
#else
where TTuple : struct, IEquatable<(Int32,Int32)>
#endif
{
if( source is null ) throw new ArgumentNullException( nameof( source ) );
ValueTuple<Int32,Int32> sum = MultiSum( source.Select( UnsafeValueCasts< /*TFrom:*/ TTuple, /*TTo:*/ ValueTuple<Int32,Int32> >.Convert ) );
return .UnsafeValueCasts< /*TFrom:*/ ValueTuple<Int32,Int32>, /*TTo:*/ TTuple >.Convert( sum );
}
So now the call-sites maintain the value-tuple member names:
var list = new[]
{
( a1: 1, b2: 4, c3: 7 ),
( a1: 2, b2: 5, c3: 8 ),
( a1: 3, b2: 6, c3: 9 )
};
var sum = list.MultiSum(); // sum == ( a1: 6, b2: 15, c3: 24 )
@Jehoel
And while this code below ostensibly works, if you examine the IL code the
if( tuple is ValueTuple<Int32,Int32,Int32,Int32> vt )
check is compiled to abox TTuple; isinst ValueTuple<Int32,Int32,Int32,Int32>
instruction, which is very suboptimal.
While the IL does contain the box
instruction, I believe the JIT is smart enough to perform no actual boxing for this code. Have you tried to run a benchmark using that version of the code, measuring allocations?
@svick Getting VS to measure allocations in a Unit Test project is awkward, so for now I did a 10,000-iteration test with a Stopwatch
with a Release build, and the results surprised me:
[TestClass]
public class ValueTypeConversionTests
{
[TestMethod]
public void Quick_and_dirty_perf_test()
{
// Warmup and correctness:
{
var a = UnsafeValueCasts < ValueTuple<Int32,String>, (Int32 x, String y) >.Convert( ( 123, "string1" ) );
var b = SafeButBoxedValueCasts< ValueTuple<Int32,String>, (Int32 x, String y) >.Convert( ( 456, "string2" ) );
a.x.ShouldBe( 123 );
a.y.ShouldBe( "string1" );
b.x.ShouldBe( 456 );
b.y.ShouldBe( "string2" );
}
{
var x = UnsafeValueCasts < (Int32 x, String y), ValueTuple<Int32,String> >.Convert( ( 789, "string3" ) );
var y = SafeButBoxedValueCasts< (Int32 x, String y), ValueTuple<Int32,String> >.Convert( ( 012, "string4" ) );
x.Item1.ShouldBe( 789 );
x.Item2.ShouldBe( "string3" );
y.Item1.ShouldBe( 012 );
y.Item2.ShouldBe( "string4" );
}
Stopwatch sw = Stopwatch.StartNew();
for( Int32 i = 0; i < 10 * 1000; i++ )
{
var x = UnsafeValueCasts< ValueTuple<Int32,String>, (Int32 x, String y) >.Convert( ( i, "string5" ) );
x.x.ShouldBe( i );
}
TimeSpan unsafeTime = sw.Elapsed;
sw.Restart();
for( Int32 i = 0; i < 10 * 1000; i++ )
{
var x = SafeButBoxedValueCasts< ValueTuple<Int32,String>, (Int32 x, String y) >.Convert( ( i, "string6" ) );
x.x.ShouldBe( i );
}
TimeSpan safeTime = sw.Elapsed;
//
this.TestContext.WriteLine( "10,000 iterations: Unsafe time: {0:N2}ms. Safe time: {1:N2}ms", unsafeTime.TotalMilliseconds, safeTime.TotalMilliseconds );
unsafeTime.ShouldBeLessThan( safeTime );
}
}
Results for 10,000 iterations (Release build):
Target | Unsafe IL Emit | Safe (potentially boxing) isinst |
Fastest |
---|---|---|---|
.NET Framework 4.8 | 2.97ms | 6.04ms | IL Emit |
.NET Core 3.1 | 6.93ms | 3.54ms | Safe boxing |
.NET 5.0 | 6.09ms | 2.95ms | Safe boxing |
This is weird: my IL Emit code runs at half the speed on .NET Core 3.0 or later, while the type-check runs in half the time in .NET Core 3.0+.
The numbers change when I do 100,000 iterations (though I still don't have actual GC figures):
Results for 100,000 iterations (Release build):
Target | Unsafe IL Emit | Safe (potentially boxing) isinst |
Fastest |
---|---|---|---|
.NET Framework 4.8 | 32.25ms | 24.80ms | Safe boxing |
.NET Core 3.1 | 39.24ms | 35.39ms | Safe boxing |
.NET 5.0 | 32.80ms | 29.16ms | Safe boxing |
I'm stumped...
Ah, the Shouldly
library was interfering with things. (I thought the .ShouldBe()
was a simple method, turns out it isn't). After I disabled the .ShouldBe()
check and replaced it with adding x.x.
to an integer value (so the whole loop wouldn't be optimized-away) I get these numbers:
Int64 nvm = 0;
for( Int32 i = 0; i < ITERATIONS; i++ )
{
var x = UnsafeValueCasts< ValueTuple<Int32,String>, (Int32 x, String y) >.Convert( ( i, "string5" ) );
#if PROFILING
nvm += x.x; // So the loop body isn't elided by any of the compilers.
#else
x.x.ShouldBe( i );
#endif
}
Results for 10,000 iterations (Release build):
Target | Unsafe IL Emit | Safe (potentially boxing) isinst |
Fastest |
---|---|---|---|
.NET Framework 4.8 | 0.22ms | 0.23ms | IL Emit (Safe runs in 105% time) |
.NET Core 3.1 | 0.17ms | 0.56ms | IL Emit (Safe runs in 329% time) |
.NET 5.0 | 0.21ms | 0.64ms | IL Emit (Safe runs in 305% time) |
Results for 100,000 iterations (Release build):
Target | Unsafe IL Emit | Safe (potentially boxing) isinst |
Fastest |
---|---|---|---|
.NET Framework 4.8 | 1.13ms | 2.24ms | IL Emit (Safe runs in 198% time) |
.NET Core 3.1 | 2.10ms | 8.43ms | IL Emit (Safe runs in 401% time) |
.NET 5.0 | 1.64ms | 7.35ms | IL Emit (Safe runs in 448% time) |
Interesting that there's actually a substantial performance regression in the Safe/boxing code from .NET Framework to .NET Core 3.1.
I was able to get the code to run in .NET 5.0 with the VS Performance Analyzer for GC allocation info:
Interestingly it performed 20,000 allocations, not the 10,000 I was expecting... this got weird.
I dumped the disassembly of the .NET 5 version of the JIT'd safe-but-boxing method, and I'm seeing two calls to JIT_BoxFastMP_InlineGetThread
and the isinst
op being converted into a call to CastHelpers.IsInstanceOfClass
twice, which is unnecessary.
The method was jitted with the debugger attached, so we see the JIT_DbgIsJustMyCode
check.
So why is the JIT generating such sub-optimal code here?
Currently we have 2 families of types to represent tuples: classes
System.Tuple<T1, T2, ...>
and mutable structsSystem.ValueTuple<T1, T2, ...>
. There are a family of extension methods to convert between them:System.TupleExtensions.ToTuple<...>(...)
andSystem.TupleExtensions.ToValueTuple<...>(...)
. Both families of tuples are already stuck in some APIs for good. Some users have their own tuple types from before tuples were introduced into the standard library.To improve interoperability, avoid making unnecessary copies, and support covariant conversions, I propose to introduce a family of covariant interfaces
System.ITuple<out T1, out T2, ...>
that will be implemented both byTuple<...>
andValueTuple<...>
. Users also can make their custom tuple types implement these interfaces. Here is a sketch of how they might look:The existing static class
System.TupleExtensions
could add extension methods to support these new interfaces: