Open teo-tsirpanis opened 6 years ago
I would be happy to see this change made
My current way of dealing with this is to split the union in nested unions, which improves performance for large DUs, but it's sub optimal at best.
I'd be interested in timings for the tag being a byte or a short. Since many DU matches translate into a table lookup in IL, which I believe requires a native int, it may lead to performance degradation in terms of speed. If so, I'd think the trade-off of size vs speed isn't worth it.
@dsyme How complex is this task? If you think it's not too complex I would like to try to develop a initial implementation.
Would this affect behaviour at runtime using e.g. reflection over a DU?
If using the System.Reflection
classes, then I guess it would.
But F# types aren't supposed to be reflected this way.
Perhaps we could add some metadata to these merged types (maybe a new
attribute), which would make the FSharp.Reflection
experience transparent
from the developer's perspective.
Στις Τετ, 21 Νοε 2018 στις 10:42 π.μ., ο/η Isaac Abraham < notifications@github.com> έγραψε:
Would this affect behaviour at runtime using e.g. reflection over a DU?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fsharp/fslang-suggestions/issues/699#issuecomment-440581901, or mute the thread https://github.com/notifications/unsubscribe-auth/AMEqM2a-d8M2TUIGny6Z0M9uBC2I3LRnks5uxRH7gaJpZM4XA8TO .
I found this suggestion when I was about to propose relaxing the unique-names rule to allow explicit sharing of fields, so
[<Struct>]
type MyType =
| Case1 of i1:int * c1: char
| Case2 of i1:int
| Case3 of s3: string * i1:int
would have just one i1
field. But I'm glad more sophisticated solutions are being considered too.
I just want to say I find the "partial-merge" feature mentioned in "Some other thoughts" vital for this: I often have one or two main (sometimes generic) types that appear in most cases, with a few "attached" fields that depend on the case.
At the very least, I'd expect mixed case types to be merged too. So
[<Struct>]
type MyType =
| Case1 of v1:int
| Case2 of v2:int
| Case3 of v3:single
| Case3 of v4:single
would colapse to one int
, one single
and the tag field. Although I'd argue for my prior suggestion if we couldn't go further.
@teo-tsirpanis I wrote about this a year and a half ago, but it was on an issue that go closed on the compiler. There's a little more discussion on the thread but i'll just repost my comment
I started looking at how these struct unions decompile and it seems like changing how the internal fields of the struct are generated could fix some of this isssue
[<Struct>] type Cases = A of int | B of a:decimal
decompiles to (via ILSpy)
[StructLayout(LayoutKind.Auto, CharSet = CharSet.Auto)]
public struct Cases : IEquatable<test.Cases>, IStructuralEquatable, IComparable<test.Cases>, IComparable, IStructuralComparable
{
public static class Tags
{
public const int A = 0;
public const int B = 1;
}
[DebuggerBrowsable(DebuggerBrowsableState.Never), DebuggerNonUserCode, CompilerGenerated]
internal int item;
[DebuggerBrowsable(DebuggerBrowsableState.Never), DebuggerNonUserCode, CompilerGenerated]
internal decimal _a;
[DebuggerBrowsable(DebuggerBrowsableState.Never), DebuggerNonUserCode, CompilerGenerated]
internal int _tag;
[DebuggerBrowsable(DebuggerBrowsableState.Never), DebuggerNonUserCode, CompilerGenerated]
public int Tag
{
[DebuggerNonUserCode, CompilerGenerated]
get
{
return this._tag;
}
}
the case with an unnamed field defaults to item, so I'm guessing the reason a DU without field names doesn't work is because it always wants to use item and it clashes?
if prefixing the declared fieldname in the code with _
is the convention already being used, couldn't that be done using the case names as well? Cases already enforce a unique name rule.
e.g.
[<Struct>] type Cases = A of int | B of int
internal int _A;
internal int _B;
@ReedCopsey suggested that when all of the cases have the same type we should just store it in the same field since the tag is enough for the comparison to continue working
which would mean only one constructor would need to be generated
[DebuggerNonUserCode, CompilerGenerated]
internal Cases(int _value, int _tag)
{
this._value = _value;
this._tag = _tag;
}
otherwise we could still use the same constructor, instead switching on the tag against the generated static class to set the appropriate field
public struct Cases : IEquatable<test.Cases>, IStructuralEquatable, IComparable<test.Cases>, IComparable, IStructuralComparable
{
public static class Tags
{
public const int A = 0;
public const int B = 1;
}
internal int _tag;
internal int _B;
internal int _A;
internal Cases(int _value, int _tag)
{
this._tag = _tag;
switch (_tag)
{
case Tags.A:
this._A = _value;
case Tags.B:
this._B = _value;
}
}
For even more compactness, I'd like to have
[<Struct>]
type DU = A of char | B of string | C of int //yes, no field names, should be inferred
to compact to
using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Explicit)]
public struct MySystemTime
{
[FieldOffset(0)] internal byte _tag;
[FieldOffset(1)] internal char _A;
[FieldOffset(1)] internal string _B;
[FieldOffset(1)] internal int _C;
//public properties omitted
}
The overlapped fields make the struct even more compact.
The "loss" of type safety here is negated by stating the fact that reading from this struct without checking the tag can result in undefined results.
@Happypig375 I'm afraid your example wouldn't work, you can't overlap references and value-types in memory because the GC can't tell which type it's currently holding. Plus, you should respect alignment when specifing offsets, so not that compact.
Also, LayoutKind.Explicit
can't be used in generic types due to CLR limitations so it wouldn't be applicable most of the time.
So do this only when all cases are value types then?
@happypig375, I think we should separate the proposals on using one field for same type DU's, and overlapping value types with explicit layout. There are many pitfalls for the latter, misalignment (as in your example) for instance leading to worse performance characteristics, because the CPU requires extra moves on non machine word boundaries. It also dampens predictability, which hurts prefetch.
Just sharing a single field if the types are the same (potentially allowing same size value types) seems to me an easier task to get right, without the complexity of those alignment issues.
@Happypig375 You'd need cases to be of unmanaged types, which is even stricter. Otherwise, inner reference members could still overlap with non-references (or other misaligned references) and result in a TypeLoadException
at runtime.
Don't get me wrong, I'd like to have actual memory sharing too. But I think "merge unmanaged members in non-generic unions" is too narrow to be reliable. Unless the CLR adds support for discriminated unions soon, I'd prefer to focus on more general optimizations.
Perhaps we could add some metadata to these merged types (maybe a new attribute), which would make the
FSharp.Reflection
experience transparent from the developer's perspective.
This seems very reasonable and I had a similar thought.
Re-reading this thread, I can see essentially the following, relatively cheap (I hope) areas for optimization by the compiler:
obj
, and cast.Generally, I don't think we really have to optimize reference-type fields in struct DU's, as that's usually a bad choice anyway and is better suited in a reference DU.
None of the above will work if a case-type is a mixed struct (itself a struct with at least one reference type). But for the majority of cases, the above will greatly reduce space and with it, copy-on-write time.
For good measure, we could also add some improvements for DU-fields without values:
enum
, where get_Tag()
and comparison gets erased in favor of table lookup (swich-style)get_Tag()
style (unless there's a better way)Tag = 0
) should become a single brfalse
or brtrue
, like for normal DUs with UseNullAsTrueValue
. This dramatically improves performance for small DUs and voption
in particular.About reflection: I think any reflection would be on the field gettors and the _tag
or get_Tag()
. These gettors ought to remain in place. If reflection exists that relies on the layout of the underlying struct, I don't really see how we can keep that working with a change like this.
I would be happy to see this change made
@dsyme, do you still feel like that? If you do, would the above be attainable? If this gets an approved-in-principle
, I'd be happy to write an RFC and start a PR (possibly incremental), possibly/hopefully with some others in this thread that have shown an interest in implementing this.
We'd presumably lose the field names in merged cases (unless they are the same, that is) when inspecting a struct case in the debugger. Is there anything that can be done to remedy that? It might also be worth dropping the field name requirement at the same time because if the name is not even in the compiled representation, what's the point.
@kerams I made a comment suggesting explicit sharing of fields back in this thread which would address that. I also proposed it recently as a separate suggestion because I don't think this one is reaching consensus about the details anytime soon.
@abelbraaksma
Match over the zero-case (that is, the first case in a struct-DU without a value [...] This dramatically improves performance for small DUs
I've implemented the change, but am not going to bother to create a PR because it makes no difference.
let z (a: ValueOption<int>) =
match a with
| ValueNone -> "dsa"
| _ -> "dsad"
becomes
IL_0000: ldarga.s a
IL_0002: call instance int32 valuetype FSharp.Core]Microsoft.FSharp.Core.FSharpValueOption`1<int32>::get_Tag()
IL_0007: brtrue.s IL_000F
IL_0009: ldstr "dsa"
IL_000E: ret
IL_000F: ldstr "dsad"
IL_0014: ret
instead of
IL_0000: ldarga.s a
IL_0002: call instance int32 valuetype [FSharp.Core]Microsoft.FSharp.Core.FSharpValueOption`1<valuetype [System.Private.CoreLib]System.Int32>::get_Tag()
IL_0007: ldc.i4.0
IL_0008: bne.un.s IL_0010
IL_000a: ldstr "dsa"
IL_000f: ret
IL_0010: ldstr "dsad"
IL_0015: ret
The jitted assembly is the same in both cases
mov [rsp+8],rcx
cmp dword ptr [rsp+0C],0
jne short M01_L00
mov rax,2B329999AF8
mov rax,[rax]
ret
M01_L00:
mov rax,2B329999B00
mov rax,[rax]
ret
Request for feedback:
Any layout-changing proposal is a binary breaking change for existing definitions, which means it could probably only go in via opt-in (e.g. attribute).
But as a small step in the right direction, I am thinking about doing this proposal only for equal field names+types across different cases. Therefore both solving the breaking compat concern (such definition is a hard error as of now, not possible) and also not making debugging harder - the fields will be there with the same names as typed in.
It would of course be only part of the proposals around struct DUs, but at the same time it is one that could go in without significant risk added and help in cases where cases do share the same type of fields.
I'm honestly still getting over being forced to resort to astonishment when I found out, after years of using them, that DU's weren't implemented as a base class and derived classes for each case so that there never were wasted fields. At least having one field per name will be an improvement.
@T-Gro Fully agreed. I made the same proposal ages ago, twice, and I still think it's the only realistic path forward, but the allure of a more powerful solution lead this discussion to a stalemate.
@realparadyne Yes they are, when defined as reference types (the default). But this is about defining value-type DUs (Struct
).
Merge fields of the same type on multi-case union structs
I propose we change the way multi-case union structs are represented.
Let's have the following as an example:
Internally, this type is represented as three integers, plus another one to indicate the union case.
My proposal is to make unions like this (structs where all cases are of the same type (or tuple of types)), internally store the value only once, like the following example.
The existing way of approaching this problem in F# is this one, but it ruins type-safety:
Pros and Cons
The advantage of making this adjustment to F# is generation of more compact types. Struct multi-case unions are mainly used for low-level code where space efficiency is an advantage.
The disadvantage of making this adjustment to F# is
lack of backwards-compatibility. Actually, not at all. We would still enforce the unique names requirement for the structs, just they would have to be redirected to the same field, for C# consumers, or return the default element when accessing the wrong field for the wrong union case.Some other thoughts
For unions with 256 cases or less, we could compact the data structures even more by storing the tag field as a byte. This would apply to all DU types in general,
but I am not sure about the backwards compatibility, we could still expose them as 4-byte signed integers in a property for C# consumers. There might also be issues mith memory alignment and other black magic like that.For even more compactness, we could partially merge the same-type elements of the DU cases as in the example:
The type above could be represented as a struct with only three fields instead of five: one
int
for thei1
,i2
andi3
fields, one string fors3
, and onechar
forc1
. And their position inside the tuple wouldn't matter at all. But this would make the proposal significantly more complex. And it would also raise questions about nested tuples and other stuff like that.Extra information
Estimated cost (XS, S, M, L, XL, XXL): S
Affidavit (please submit!)
Please tick this by placing a cross in the box:
Please tick all that apply: