Open lrhn opened 6 years ago
Would these approaches allow to add static methods like
List<T> values = [foo, bar, baz];
List<T> get evenValues => values.where((e) => e.value % 2 == 0).toList();
to be used like
print(EnumClass.values);
print(EnumClass.evenValues);
or be used with extension methods to make such additional methods available?
Would it not be better to make the more powerful and flexible old-style enums
The currently available alternative
more convenient to write and use instead of adding different limited ways to define enums?
Since enums are all sealed classes, couldn't we continue to pretend they are as they are now, but just compile them down to more-or-less-just integers?
That is to say, why are enums necessarily heavy, as currently specified?
That is to say, why are enums necessarily heavy, as currently specified?
In large part because the following program must print "true" and "false", which implies that in general enums must support dynamic dispatch (which de facto means they pretty must be represented as heap allocated values with vtables).
enum Color { red, green, blue }
class NotAColor {
String index = "hello";
}
void foo(dynamic a) {
print(a.index is String);
}
void main() {
foo(new NotAColor());
foo(Color.red);
}
Can we use the same strategy as we use to make this work?:
class NotAnInt {
String sign = "hello";
}
void foo(dynamic a) {
print(a.sign is String);
}
void main() {
foo(new NotAnInt());
foo(2);
}
@Hixie int
s are special. It is an interesting idea to make other values special like this. Implementations of LISP often have multiple kinds of special values.
On the VM ints have two representations, one is a boxed value (an allocated object), the other uses a bit in the pointer representation to distinguish it from a real pointer (Smi).
This makes the call a.sign
more complex that it would be otherwise, even when a
is typed as an int
, and makes a.index
more complicated because a
might be a Smi, so the VM needs to generate code to check the bit even though .index
is not defined on int
.
There could be other values shoe-horned into the pointer bits, it will add complexity everywhere, just like the current Smi representation makes a.index
more complex.
When compiling to JavaScript, there is no kind of value we can play this kind of trick with. If we can come up with a scheme that works well when compiling to JavaScript, that would be preferable to one that where JavaScript is significantly worse.
It is important to note that the 'Enums' from protobufs are not the same type as declared by the enum
declaration. protobufs have some validation and reflective operations, so the reduction will not be O(10000) to O(1), but to a lesser O(10000).
Enums are ... the only feature affording easy completion.
I don't believe that's true.
The currently available alternative is to just use constant integers ... However, that pattern is less readable, less writable and doesn't complete well in the editor. You have to write
EnumClass
before it lets you complete to.foo
.
The completion issue could be fixed. We could easily add all static fields (or some subset of static fields) to the list of suggestions, but what we'd need to validate is that doing so actually improved the UX.
On the VM ints have two representations, one is a boxed value (an allocated object), the other uses a bit in the pointer representation to distinguish it from a real pointer (Smi).
FWIW, I think SMI is one of Dart's vestigial features that should be removed. To be fair, it's a very clever trick, and it made a lot of sense back when Dart was a dynamic programming language, where it actually had a huge performance benefit.
In the statically typed AOT world I think all it does is make performance worse and hinders new language features. int
s should be non-nullable by default (if fact all types should be non-nullable by default), it should be represented by 64-bit integers embedded in the object representation and in the stack frames, and it should be no slower than C++'s 64-bit integers. The same applies to doubles and booleans.
The language should also allow defining custom unboxed types of known size with the same level of efficiency, including enums, grapheme clusters, Size
, Offset
, Array<UNBOXED_TYPE>
, various "measure" values such as length in pixels, time, temperature, currency, percentage. These simple things are boxed in Dart which leads to higher memory usage, higher GC pressure, and likely also bloated native code (I believe @mraleph's Vipunen experiment confirmed that).
As a litmus test Dart should be powerful enough that you could implement your own String
or https://github.com/dart-lang/language/issues/34 very efficiently.
@yjbanov
It's not Smi
s that are the problem, it's boxed integers in general (MInt
is also a boxed integer).
Dart can already do many of the optimizations you mention. We can unbox integers in many places, but without whole-program analysis and optimization, we can't do so on API boundaries.
Being nullable is one of the problems, but not the only one. Integers are objects, and int
is a sub-type of Object
.
Generic classes will need to be specialized for unboxed values. If I make a Queue<int>
, and I only want to compile the Queue
code once, then the elements of all instances need to be homomorphically represented (aka, boxed). To get around that, we can specialize the class, using a completely different implementation and internal representation for Queue<int>
than we do for Queue<Object>
. That means generating twice the code, though, or having some extra abstraction layer converting between internal and external representation. And since a Queue<int>
is assignable to Queue<Object>
, any use at the latter type will need to box and unbox at the API level.
For AOT compilation, you need to know all the specializations ahead of time.
And then there are dynamic invocations. We can probably do something clever, but at the cost of making dynamic invocations even more expensive - they will basically have to match the function signature to the argument list signature at run-time to see which arguments can/must be passed unboxed, and then do boxing/unboxing as necessary.
FWIW, I think SMI is one of Dart's vestigial features that should be removed.
Smi is not a feature. It's an optimization of boxed integers. If there would be no need to box integers - there would be no need for smi's.
As @lrhn points out boxing originates from Dart's dynamic features - if you want to eliminate boxing altogether you need to eliminate dynamism.
FWIW, I think SMI is one of Dart's vestigial features that should be removed.
Smi is not a feature. It's an optimization of boxed integers. If there would be no need to box integers - there would be no need for smi's.
Performance is a feature :)
As @lrhn points out boxing originates from Dart's dynamic features - if you want to eliminate boxing altogether you need to eliminate dynamism.
The goal is not to eliminate dynamism. The goal is to have predictable performance in code that requires it, which is usually very small portion of the code, and therefore the cost of having to deal with some "advanced" language features (if we have them) is relatively low, and at the same time it's where performance really really matters.
As @lrhn points out boxing originates from Dart's dynamic features - if you want to eliminate boxing altogether you need to eliminate dynamism.
Even if we eliminated dynamic
entirely, we'd still have to box because int
is a subtype of Object
. Removing dynamic
would force you to cast an Object
to int
before you use it, but it's still possible to override a method that takes an int
with one that takes Object
.
That in turn means a callsite that's passing an integer needs to use a calling convention that can handle that being stored as an Object parameter. That's also a source of trouble for us, isn't it?
We could potentially do what Java and C# do and move numbers out of the object hierarchy. int
wouldn't be a subtype of Object
, but you could do a conversion to it to explicitly box.
That has its pros and cons, but might be a better trade-off than Dart's current approach which is to force all users to pay the cost of flexibility all the time, everywhere. I'd be really interested to see some data on how often real code actually uses the fact that numbers are a subtype of Object
.
That in turn means a callsite that's passing an integer needs to use a calling convention that can handle that being stored as an Object parameter. That's also a source of trouble for us, isn't it?
Let's take this example:
class A {
foo(int a);
}
class B extends A {
@override
foo(Object a) {
print(a);
}
}
main() {
var b = B();
b.foo(5);
b.foo('hello');
}
Can we compile it to the following?
class B extends A {
foo(int a) {
foo$Object(box<a>);
}
foo$Object(Object a) {
print(a);
}
}
main() {
var b = B();
b.foo(5);
b.foo$Object('hello');
}
We could potentially do what Java and C# do and move numbers out of the object hierarchy.
int
wouldn't be a subtype ofObject
, but you could do a conversion to it to explicitly box.
This would be the cleanest approach IMHO. An alternative to boxing is to encode Object
and dynamic
pointers as (type, value) tuples, where value is either the value itself or a pointer to an object on the heap. This might actually be better for performance because of data locality, and if values are unique enough (particularly for doubles), they should be strictly better than boxing because each tuple is 2x64 bits long vs each boxed object that requires a minimum of 3x64 bits: (pointer) -> (header, value). You do get the benefit of being able to share values, but something tells me that's not very effective.
That has its pros and cons, but might be a better trade-off than Dart's current approach which is to force all users to pay the cost of flexibility all the time, everywhere. I'd be really interested to see some data on how often real code actually uses the fact that numbers are a subtype of
Object
.
+💯
Can we compile it to the following?
Maybe. If you have a method like foo(int x, int y, int z)
, you would have to generate eight versions. That's a very large size overhead for ahead-of-time compilation (like JavaScript compiled programs, which you would want to be small).
If you have whole-program analyses, you might be able to cut down on some of those cases.
Doing auto-boxing is what Java did. It means that generics only work for objects, so a List<Integer>
will contain boxed integers. C# went the other direction, and did generic specialization for value types, so a List<int>
is running different code than List<Object>
.
You get more code duplication and faster code
Both Java and C# are just-in-time compiled, so you only pay for what you use. Dart is ahead-of-time compiled to JavaScript, so we don't get that luxury.
As for using tuples, I don't see how Object
and dynamic
values differ from other types of values.
We have first class functions in Dart. A Object Function(String) f;
variable can contain an String Function(Object)
value. Doing Object o = f("42")
would not statically know that the function expects an Object
value, nor that it returns an String
. You would need to make all values tuples, and then you have effectively just introduced tagged values. For smis, which is the overwhelming majority of numbers, that's a doubling in size.
That's a very large size overhead for ahead-of-time compilation (like JavaScript compiled programs, which you would want to be small).
A couple of options:
Object
is rareIf you have whole-program analyses, you might be able to cut down on some of those cases.
I cannot rely on whole-program analysis when optimizing performance-sensitive parts of my code. Whole program analysis optimizations are useful as a last-minute vacuum sealer. I need language primitives that are predictable. For example, I should be able to make performance assumptions that cannot be invalidated by an unrelated part of the code in a large application. Unfortunately, by definition, using non-local information for optimizations is precisely what whole program analysis does.
Doing auto-boxing is what Java did. It means that generics only work for objects, so a
List<Integer>
will contain boxed integers. C# went the other direction, and did generic specialization for value types, so aList<int>
is running different code thanList<Object>
. You get more code duplication and faster code
I'm totally fine paying up to maybe 4x code duplication to gain performance of the 5% of code where performance matters.
Both Java and C# are just-in-time compiled, so you only pay for what you use. Dart is ahead-of-time compiled to JavaScript, so we don't get that luxury.
My gut feeling is that it won't impact JS much, but it will likely not have any benefits either. To get benefits from unboxing in JS we should start thinking about WebAssembly (high time we did tbh).
As for using tuples, I don't see how
Object
anddynamic
values differ from other types of values.
This is not observable by the developer, so there's no difference. I was only suggesting an alternative in-memory representation. Instead of boxing the value, you can use fat pointers, like Go does.
We have first class functions in Dart. A
Object Function(String) f;
variable can contain anString Function(Object)
value. DoingObject o = f("42")
would not statically know that the function expects anObject
value, nor that it returns anString
. You would need to make all values tuples, and then you have effectively just introduced tagged values. For smis, which is the overwhelming majority of numbers, that's a doubling in size.
The size overhead is only proportional to the amount of numerics that are boxed. My guess is that most of them do not need to be boxed. I think Golang is a good source of stats for stuff like that.
Here is a comment that I added to the discussion in #158, which turns out to fit better here:
@JohnGalt1717, I understand that the [Flags]
enumerations in C# mentioned here allow for a very performant representation of small sets, based on a numeric type treated as a bit vector.
Dart enum types will not do exactly the same thing unless they are radically redesigned, and, as @munificent mentioned here, those C# enumerations lack a number of capabilities and guarantees that we do have (and presumably won't give up) with Dart enum types.
However, Dart is very likely to be extended with a mechanism like views, and they are directly aimed at supporting low-level operations on a highly performant representation, protected by a specific static type.
So, setting out from the C# examples you mentioned, here's how we could provide support for it using views:
view Bitset on int {
bool operator <=(X x) => this & x == this;
bool operator >=(X x) => this & x == x;
bool operator <(X) => this != x && this <= x;
bool operator >(X x) => this != x && this >= x;
}
view Languages extends BitSet on int {
static const Languages
CSharp = 0x0001,
VBNET = 0x0002,
VB6 = 0x0004,
Cpp = 0x0008,
FortranNET = 0x0010,
JSharp = 0x0020,
MSIL = 0x0080
All = CSharp | VBNET | VB6 | Cpp | FortranNET | Jsharp | MSIL,
VBOnly = VBNET | VB6,
NonVB = CSharp | Cpp | FortranNET | Jsharp | MSIL;
Languages operator |(Languages other) => this | other;
Languages operator &(Languages other) => this & other;
}
view Days extends BitSet on int {
static const Days
Monday = 0x0001,
Tuesday = 0x0002,
Wednesday = 0x0004,
Thursday = 0x0008,
Friday = 0x0010,
Saturday = 0x0020,
Sunday = 0x0040,
Weekend = Saturday | Sunday;
Days operator |(Days other) => this | other;
Days operator &(Days other) => this & other;
}
void main() {
// Usage of `Languages` as a bit set.
Languages lang = Languages.CSharp | Languages.MSIL;
print(lang <= Languages.NonVB); // Subset relation: 'true'.
print(Languages.FortranNET <= lang); // Membership: 'false'.
print(lang == lang); // Equality: 'true'.
// Different bit sets are statically separate.
Days days = Days.Weekend | Days.Wednesday;
lang = days; // Compile-time error.
}
With views, different bit sets can be declared (like Languages
and Days
), and they will be mutually distinct types that are not assignable to each other, but all the operations are resolved statically which means that they can be inlined and compiled as low-level bit operations on the int
representation.
This kind of bit sets do not restrict the values (we can introduce the value 0xFFFF
and arrange for it to have the type Languages
), so we get similar performance and lack-of-guarantees as we have with the C# enumerations.
I think this illustrates that the whole discussion about this kind of feature in Dart may be important in its own right, but it might not be relevant to discussions about enum
in Dart, because that mechanism is simply not the best fit for this task in Dart.
@eernstg That's great, however the root issue with enums as they stand, right now for every Dart developer I've talked to, is that you can't serialize them properly without massive amounts of boilerplate and hackery which makes Dart incompatible with most APIs that other languages generate.
You're literally adding an entirely new class of language idioms for something that can be solved by making enum be implicitly a bit shifted num which would, by definition of num, still be inherited from object but would allow all of the bitwise operators, and allow value assignment exactly like C# with no ceremony, and wouldn't break anything, because you can easily check at compile time that assignments are done enum to enum or explicitly with a cast, exactly like C# does, which doesn't break the paradigm and still allows full switch validation as well simply because the switch is on the enum type. (which is far as I can tell is all that enums in dart do, there's literally nothing else to them). And if the user wanted to break out of the box they can explicitly cast the enum to an int and then write the switch based on that and add whatever other arbitrary values they abused enums with. Or you could set an analyzer option that would prevent arbitrary assignment even with cast and voila, no abuse allowed. You could even make that the default. But of course then you'd break easy deserialization of ints in json/protobuf to enums without either disabling it explicitly or some specific function on the enum that did it manually. The former would probably be preferable and allow ignore commands in the code on the file level as an example.
And since Dart's primary (virtually only) job in life at this point is to be the programming language for Flutter, this is of monumental concern because all you ever do is interop with other languages (and even if this wasn't the case, you can see the mess this creates even with Google's own GRPC even if you're interoping with a Dart server). If Dart gets value assignments to enums, then, since I don't use enums in any way that is performant or has a critical path, all of this is immaterial to me on a day to day basis.
But the point of this topic is well taken because indeed, the problem is that Dart enums are objects.
There's absolutely no downside to making enums inherit from num (or int, or whatever else, since C# allows you to specify any numeric type) which itself inherits from object. No code would break as a result of doing so. Nor would any code break if you then use ordinal position in the enum to assign an int if not specified at compile time to every value of the enum like C# does, and then allow users to assign (with = !) the values explicitly if they so choose like C# does and then enable implicitly the bitwise operators (<<,>>,<,>,=,&,|,&=,|= etc). Nor would it introduce a new set of bugs or possible issues, because the compiler by default would throw on any out of bounds assignment that wasn't explicitly cast. (presumably) which would then enable switches to still be exhaustive.
You could further have a separate code path that allows other types to be defined for enums if you so desired by simply allowing the user to assign non-numerics to the enum, and compile/analyser time validating that all types assigned are the same (or not if you want to really shoot yourself in the head). And this could even be a different implementation of the same methodology. Standard num based enums can be hard coded and work as they do right now, but with all of the above with:
enum Something {
one = 1,
two = 2,
four = 4,
{
This would automatically and implicitly be like the original suggestion of "on int".
And if you want to put other values in:
enum SomethingStringy on String {
one = "one",
two = "two",
four = "four",
}
and omitting the type declaration but specifying something other than a num, results in the generic version being used by the compiler/analyzer and it automatically picks the best, most specific type based on the values provided (which Dart already does in lots of cases with numeric values choosing int instead of doubles etc.) and if they are mixed types that don't share a type other than object (i.e. strings and ints) then it can just choose dynamic or object as the type. And this implementation then doesn't have the bitwise operations available which causes a compile time failure if you try and use them, AND uses a set or map in the background.
This would also enable deserialization from the int (or other representation) as simply someVariable = map["field"] as SomeEnum, and the other way would just be map["field"] = someVariable (which would automatically assign the root type to the map on serialization because that could be gotten at runtime when doing the serialization) If you skipped the as SomeEnum in the above, then it would throw because the implicit cast wouldn't be valid, even map["field"] wasn't dynamic and was a variable of type int because of no implicit casts being able to be set in your analyzer configuration. Unless I just don't see it, your views implementation doesn't fix the massive amount of boilerplate that is required in this case with enums as they stand and in the view suggestion. (i.e. I have to have a switch to map them still from the int to the actual value, and another switch to map to an int to serialize which is anoying and error prone. This might not be the case if operator overriding allowed you to override with types that weren't the type on = and handle an int comparison and have it work as explicit assignment in both directions)
As for the request for Java/Kotlin style functionality, I'd suggest again, the C# methodology which is the Enumeration class from which you can inherit, which takes a generic of the enum, which you can define inline and allows you to do whatever you want with methods, constructors, etc. etc. etc. while maintaining all of the functionality of enums. It literally gives you everything that Java/Kotlin does with their implementation while still giving you all of the superior C# properties of enums, with no downside and you don't have to add yet another paradigm to the language. You've incrementally improved Dart's enums, provided all of the functionality that everyone is asking for, and made them more performant by having them, by default, always be operations on a bitmask instead of objects being allocated on the stack.
I.e.:
class EnumClassThingy extends Enumeration<TEnumType> {
... whatever you want classy style here.
}
Where TEnumType is restricted to the new Enum generic type restriction. (TEnumType extends Enum)
And if you really wanted to be fancy, you could allow the enumeration to be defined inline:
class EnumClassThingy extends Enumeration<{one = 1, two = 2, four = 4}> { ... }
(and you could easily use your view implementation to do the same)
Which would of course make the enum not publicly referenceable outside of this class, which would give you exactly the same functionality as Java in one tight little package, while still enabling highly performant operations even within the class on the enum from which the class is based. (and I'm not hung up on the fancy syntax above, you could just have it as a final on the class that has to be passed in or defined or whatever else you want, doesn't really matter)
I believe the above addresses every single concern, and since every single example given in the previous comment about functionality in dart that C# doesn't provide is actually a feature in C# (i.e. not allowing overriding and making a more specific definition with an enum because C# enums don't inherit (they do, but the compiler refuses it for really good reasons) from object, are addressed in C# with the new keyword to replace the inherited implementation if you want, and in Dart, you can easily allow this more specific implementation because enum would inherit from int/num/whatever numeric type you want to set it to, and those themselves inherit from object and you can just make the compiler allow it unlike C# that prevents you from doing this because of an entire class of bugs you'll create doing so.
If there is other things that Dart enums can do that C# enums can't do, I'd love to see them. I've used C# enums for 2 decades and never once run into any obstacles with them that it wasn't better that I did and approached it differently, especially once they implemented Enumeration
There are like a hundred things to comment on, but it gets overwhelming, so let me just respond to a couple of things:
There's absolutely no downside to making enums inherit from num (or int, or whatever else, since C# allows you to specify any numeric type) which itself inherits from object.
I assume you're claiming that there is no downside to do so in Dart, such that the claim is relevant here.
The Dart notion of a cast (e as T
) is that we can test whether the given object has the requested type, and it will then succeed and yield the object if it has that type, or it will throw if the object does not have that type. A cast in Dart never changes the given object.
In C# we can cast out of an enum type and into another one, and this may involve significant changes to the target of the cast (even when it is implicit):
using System;
enum A { a1, a2, a3 }
enum B { b1, b2 }
public class Program
{
public static void Main()
{
object o = A.a2;
B b = (B)o;
Console.WriteLine(b.ToString()); // 'b2'.
}
}
IIUC, the inline integer representation of the enum A.a2
is boxed (copied into a fresh heap object) when it is assigned to object o
, and the box has type System.ValueType
, and it is then possible to cast it to many other types, including any enum type. If we change A.a2
to A.a3
then the program prints 2
.
In other words, the language makes no attempt to ensure that a variable of type B
has a value which is actually one of the values declared by B
, and it will happily reinterpret one enum value as an enum value of a different type, as long as they share the same bits in the underlying numeric representation, and then it will just invent names like 2
for values that don't exist.
Perhaps you will say "so don't do that", but the situation may not be so simple: You could use software written and maintained by others, and it just takes one line of code in a million line program to introduce the reinterpretation.
In Dart, an enum value has a type which is maintained robustly:
enum A { a1, a2, a3 }
enum B { b1, b2 }
void main() {
Object o = A.a2;
var b = o as B; // Throws.
print(b.toString()); // Not reached.
}
If we were to change Dart enums such that they would inherit from a numeric type (any of them) and use the same representation, then we could not maintain the type information at run time. In other words, Dart enums would then be just as leaky as C# enums.
I'm not saying that the "enums are just bits" approach of C# is wrong, I'm just saying that it is completely different from the approach in Dart where every object maintains its own integrity by having a specific type.
So it doesn't make sense if you claim that "Dart enums could just do the same thing as C#", because that would be massively breaking. And I'm also not at all convinced that the Dart approach is wrong. It just has different trade-offs, and different trade-offs correspond to different software designs.
So of course you will be unhappy if you insist on writing C# enums in Dart using Dart enums. That's probably true for any language mechanism from different languages, for instance, you shouldn't try to write Haskell functions in C.
the problem is that Dart enums are objects.
That is a choice, and I'm probably not the only person who thinks that the discussion stops if you just burst ahead and insist that it is a problem.
Different trade-offs in programming languages give rise to different software designs, and it makes a lot of sense that you would use enums in C# in a very different way than you would use enums in Dart.
But if the C# mechanism is useful then it is certainly also useful to investigate how a similar mechanism could be expressed in Dart. But it won't be called "enums"!
I happen to believe that views would be a good starting point, because they are specifically targeted at building a static harness around a low-level representation, and they allow for all operations to be resolved statically (so they can be inlined, etc). Coincidentally, views are also leaky in the sense that they allow the underlying representation to be accessed (say, a view V on int
could have a method getInt
that returns the int, and we could then adopt a completely different view on that int).
I'll stop now, because it gets overwhelming to respond to every detail.
I won't promise to respond to long discussion threads. But, @JohnGalt1717, please keep in mind that if you can't push the world into a shape that fits your head, you may need to reshape your head a little bit such that it fits the world.
The Dart IDE integration allows completing
EnumClass.name
where something of typeEnumClass
is required. However, Dart enums are objects, and a large system using complex protobufs might need to allocate ~10000 such enum objects (this is actually happening). That is a serious memory and start-up time impact on the application.The currently available alternative is to just use constant integers, like:
However, that pattern is less readable, less writable and doesn't complete well in the editor. You have to write
EnumClass
before it lets you complete to.foo
.We could introduce a special kind of type-aliased enum, say:
That is equivalent to the above class, and lets
EnumClass
be used as an alias forint
. It also lets the code completion recognize it as an enum. (If we want to really treat it as a closed enum, we could allow assignment fromEnumClass
toint
, but not the other direction. If we just want to treat it as anint
alias and allowfoo + baz
to have typeEnumClass
, then that won't work.)We could generalize that and introduce type aliases for any type, and allow static declarations on those types:
and improve code completion to include all static members returning something of the same type as the containing type. So, if the context type is
EnumClass
, it would propose completing toEnumClass.foo
,EnumClass.bar
andEnumClass.baz
, just as for an enum, and we would also complete static factory methods, not just constructors.Or maybe we can use static extension types (#42) to get the type alias, and maybe even cast functions to/from
int
onEnumClass
. If we plan to get static extension types,, we should make sure that whetever we do here will work with that syntax too.