Closed hamsbrar closed 10 months ago
The name is required by toString
, stated explicitly on the language specification, and toString
is notoriously hard to tree shake.
Can't I just nuke T.toString
when T.toString
is not in-use and there's no upcast?
Like using E1.descriptiveName.index
forces compiler to preserve E2.descriptiveName.index
as well while it should've stripped both Enum.index
and E2.descriptiveName.index
of. Probably trying that already or probably doesn't worth the effort(for fields) or probably that's not how it's done(for fields/methods).
The name seems to be there for debugging purpose. Specification doesn't seem to have any notion of production mode so I'm left with minification as the only option that compiler currently isn't doing.
Can't I just nuke
T.toString
whenT.toString
is not in-use and there's no upcast?
Unfortunately, .toString()
is called from many places that can accept anything, for example, Iterable.join
and StringBuffer.write
.
It is essentially impossible to efficiently prove that the enum values do not pass to those places. The kind of query 'does X definitely not flow here' generally amounts to 'find all the things that flow here and see that X is not included'. If we have a concise summary for 'all the things', e.g. 'an Object
' or 'an Enum
', then X is included and the test fails to remove X. If we make a precise summary, it is huge, and its computation and representation are generally superlinear in the size of the program. It just doesn't work for real-world applications. Other ways of answering that kind of query are also expensive.
Like using
E1.descriptiveName.index
forces compiler to preserveE2.descriptiveName.index
as well while it should've stripped bothEnum.index
andE2.descriptiveName.index
of. Probably trying that already or probably doesn't worth the effort(for fields) or probably that's not how it's done(for fields/methods).
A further complication is that the base class of enums, _Enum
, is where the fields reside.
All it takes is for one enum to need the index
or _name
, and they all get it.
So it is really an illusion that index
can be removed in a real program.
The name seems to be there for debugging purpose. Specification doesn't seem to have any notion of production mode so I'm left with minification as the only option that compiler currently isn't doing.
Changing the name would change the result of Enum.compareByName
. This is not something that a compiler should do by itself.
The best I can come up with is to rename your enum values, and use the 'enhanced enum' feature to add back the descriptive names in the enum scope, something like:
enum E {
_0,
_1,
_2;
static const descriptiveName1 = _0;
static const descriptiveName2 = _1;
static const descriptiveName3 = _2;
}
Unfortunately, .toString() is called from many places that can accept anything, for example, Iterable.join and StringBuffer.write.
It is essentially impossible to efficiently prove that the enum values do not pass to those places. The kind of query 'does X definitely not flow here' generally amounts to 'find all the things that flow here and see that X is not included'. If we have a concise summary for 'all the things', e.g. 'an Object' or 'an Enum', then X is included and the test fails to remove X. If we make a precise summary, it is huge, and its computation and representation are generally superlinear in the size of the program. It just doesn't work for real-world applications. Other ways of answering that kind of query are also expensive.
Here's a rephrased version of logic that's based on my unfounded assumptions:
T.toStr
=> CANNOT_CHECK T
|| CHECK_1 T.toStr
|| CHECK_2 T.toStr
|| CHECK_3 T.toStr
T.toStr
=> there exists a static expression of type T.toStr
T.toStr
=> there exists a cast(implicit/explicit) from T
to T0
&& IS_REQUIRED T0.toStr
T.toStr
=> there exists T1
that inherits T.toStr
implementation && IS_REQUIRED T1.toStr
T
=> T
is dynamic
|| T
is escaping program(things like passed to externals)(ignore termination issues, if there are any, I'm just trying to explain, the best I can, currently)
class A { toString() => 'a'; }
class B extends A { toString() => 'b'; }
class C extends B { }
class D extends C { toString() => 'd'; }
class E extends D { toString() => 'e'; }
void main {
print( C().toString() );
print( StringBuffer.write( E() ) )
}
Run the above logic againt every toString
, probably two times:
class A { /* toString() => 'a'; */ } // removed, IS_REQUIRED(A.toString) => false
class B extends A { toString() => 'b'; } // kept, CHECK_3 passes, B.toString is inherited by C and IS_REQUIRED(C.toStr)
class C extends B { }
class D extends C { /* toString() => 'd'; */ } // removed, IS_REQUIRED(D.toString) => false
class E extends D { toString() => 'e'; } // kept, CHECK_2 passes, CANNOT_CHECK(E) => true
Of course this is a simple program and compiler could've resulted in just print('b')
by eliminating the unreachable code first but I'm curious about performance implications and how stupid these checks really are on "real-world applications"?
A further complication is that the base class of enums, _Enum, is where the fields reside. All it takes is for one enum to need the index or _name, and they all get it. So it is really an illusion that index can be removed in a real program.
I understand now. I was thinking of moving Enum.index
to E1
if that's the only enum class using it. Again, probably doesn't worth the effort for fields(especially when I'm doing for straight-forward ones as doing that for super constructors with complex init logic might be unsafe) or This is not the way.
Changing the name would change the result of Enum.compareByName. This is not something that a compiler should do by itself.
If Enum.compareByName
is the only issue then I see two options:
Enum.compareByName
is not in-use.If first option is infeasible(or tree shaking this is hard) and sorting is expensive, then how about something like this:
enum E {
a, // minify this to: a
aa, // minify this to: aa
ab, // minify this to: ab
b, // minify this to: b
bb, // minify this to: bb
descriptiveName1, // minify this to: d1
descriptiveName2, // minify this to: d2
descriptiveName3, // minify this to: d3
}
// here's how I might try to do it
// just a try. you/others probably know way better methods to get these things done.
void main() {
var minifier = OrderPreservingMinifier();
for (var value in E.values) {
minifier.considerThisLiteralDuringMinification(value.name);
}
for (var value in E.values) {
print('Minified: ${value.name} to: ${minifier.minifyThisLiteralPls(value.name)}');
}
}
class OrderPreservingMinifier {
final root = OrderPreservingMinifierNode(parent: null);
void considerThisLiteralDuringMinification(String literal) {
var node = root;
for (final char in literal.split('')) {
node = (node.children[char] ??= OrderPreservingMinifierNode(parent: node));
}
node.isEndOfLiteral = true;
}
String minifyThisLiteralPls(String literal) {
var result = '';
var node = root;
for (final char in literal.split('')) {
node = node.children[char]!;
var isEndOfLiteral = node.isEndOfLiteral;
var siblingsCount = node.parent!.children.length - 1;
if (isEndOfLiteral || 0 < siblingsCount) {
result += char;
}
}
return result;
}
}
class OrderPreservingMinifierNode {
final OrderPreservingMinifierNode? parent;
final children = <String, OrderPreservingMinifierNode>{};
OrderPreservingMinifierNode({required this.parent});
/// Whether a literal from literals being minified ends at this position.
///
var isEndOfLiteral = false;
}
One thing, this considers every character in identifiers as valid leading character for a identifier:
enum E {
a1, // minifies to: 1
a2, // minifies to: 2
a3, // minifies to: 3
}
I don't want to complicate it but this is something that can be changed to match with what's required, if it works.
Here's a rephrased version of logic that's based on my unfounded assumptions:
- IS_REQUIRED
T.toStr
=> CANNOT_CHECKT
|| CHECK_1T.toStr
|| CHECK_2T.toStr
|| CHECK_3T.toStr
- CHECK_1
T.toStr
=> there exists a static expression of typeT.toStr
- CHECK_2
T.toStr
=> there exists a cast(implicit/explicit) fromT
toT0
&& IS_REQUIREDT0.toStr
- CHECK_3
T.toStr
=> there existsT1
that inheritsT.toStr
implementation && IS_REQUIREDT1.toStr
- CANNOT_CHECK
T
=>T
isdynamic
||T
is escaping program(things like passed to externals)
It's not wrong. And while this looks complicated, what's really happening is so much worse :)
Notice that:
Object.toStr
(and Object?.toStr
). There are definitely places where those happenoperator==
usually has parameter type Object
.v1 == v2
makes the type of v2
require toString
.Same for any value passed to Iterable.contains
, Map.removeKey
, identical
, Object.hash
or print
. One of those is not a false positive, but the types won't tell you which.
Actual tree-shaking needs to be much more precise than these rules, and try to guess which instances can actually flow from A to B. It's not enough to look a the types on the way, because then nothing would get tree-shaken, because everything is eventually Object?
.
Then there are generics.
class ToStringer<T> {
String toStringer(T value) => value.toString();
}
void main() {
print(ToStringer<MyEnum>(MyEnum.v1));
}
There is no occurrence of MyEnum.toString()
in the program. There is no up-cast from MyEnum
to another type, because T
is bound to MyEnum
where it's used.
So we also need to recognize which types T
can possbily be, at the point where we do T.toString()
.
And tree shaking does some of this well, but toString
is used so much, in so many places and at so many types and generic types, that it's basically unshakable, unless you keep your type from ever flowing anywhere.
Thanks for the engaging discussion, I hope the answers above provide a bit more clarity into the complexity of this issue! I'm just marking this as "closed as not planned" based on the explanations above.
Hello!
Apologies for putting multiple things in the same issue, my bad.
There are two issues here:
The second issue has been addressed(name is required by toString
and tree shaking toString
is hard using the method I suggested). I'll open a new issue, if there's anything new related to that.
The first issue, is relatively easy. Sorting and then minifying is the safest option. This could be added behind a compiler flag(probably a generic one for such optimizations). One more alternative that doesn't require sorting(so no flags) is also suggested. That can be further improved if it's found to be correct.
so first issue is still pending
(and I can't re-open this thread, it doesn't seem to have any option for a nobody)
Thanks for clarifying, I missed that detail.
@hamsbrar Can you say more about the impact of the enum names on your application? Does it prevent you from shipping an application? Does it increase the size by X% where X% clearly a significant overhead? Is there some other negative impact?
Can you say more about the impact of the enum names on your application?
There are 1,223 enum values(239 enum types). Minification removes 15,938 out of 19,530 characters.
These are the numbers I get using the method shared in this thread. An improved version of that method(or using sorting+minification) will remove more. Also this doesn't include many enums that I failed to extract and many that lives in third-party packages, framework(not flutter), in SDK/core libraries.
Does it prevent you from shipping an application?
No.
Does it increase the size by X% where X% clearly a significant overhead?
1-3% (3% when everything is lazy, 1% when most are not).
I'm not bothered by the other 97-99%. There definitely are places where I could've done better, like way better than what I actually did. Sometime I just fail not to make stupid mistakes and then things stop being worthy of any effort. But this 1-3% are the characters that no-one on this planet need and compiler knows that.
@lrhn
Notice that:
* _CHECK_1_ applies to `Object.toStr` (and `Object?.toStr`). There are definitely places where those happen * `operator==` usually has parameter type `Object`. * Ergo, any `v1 == v2` makes the type of `v2` require `toString`.
Same for any value passed to
Iterable.contains
,Map.removeKey
,identical
,Object.hash
orActual tree-shaking needs to be much more precise than these rules, and try to guess which instances can actually flow from A to B. It's not enough to look a the types on the way, because then nothing would get tree-shaken, because everything is eventually
Object?
.
I understand now.
And tree shaking does some of this well, but
toString
is used so much, in so many places and at so many types and generic types, that it's basically unshakable, unless you keep your type from ever flowing anywhere.
Before I try making a bigger nuke, I do want to know whether toString
is actually being used by the example(in this issue)? and is current mechanism failing to eliminate it or it's not trying at all for toString
s?
Also is this one using toString
too:
enum E {
a,b;
static E random() => false as dynamic ? a : b;
};
switch(E.random()) {
case E.a:
case E.b:
//..
}
if(E.random() == E.a) print('a');
else print('b');
Also,
- CHECK_1
T.toStr
=> there exists a static expression of typeT.toStr
- CHECK_2
T.toStr
=> there exists a cast(implicit/explicit) fromT
toT0
&& IS_REQUIREDT0.toStr
I see why these checks fail in so many cases, cases which will be somewhat similar to ==
.
But these checks can try taking ==
more seriously:
T.toStr
scope
=> CANNOT_CHECK T
T.toStr
scope
T.toStr
scope
T.toStr
scope
T.toStr
scope
=> there exists a static expression of type T.toStr
in scope
T.toStr
scope
=> CHK_2_A T.toStr
scope
|| CHK_2_B T.toStr
scope
|| CHK_2_C T.toStr
scope
T.toStr
scope
=> there exists a cast T
to T0
within scope
&& REQUIRED T0.toStr
scope
T.toStr
scope
=> there exists a call to method m
in scope
&& REQUIRED T.toStr
(scopeof m
)T.toStr
scope
=> there exists a call to method m
in scope
taking T
as T0
&& REQUIRED T0.toStr
(scopeof m
)T.toStr
scope
=> there exists T1
that inherits T.toStr
implementation && REQUIRED T1.toStr
scope
T
=> T
is dynamic
|| T
is escaping program(things like passed to externals)If I'm not sure about which m
(from type that enclose m
) gets call in CHK_2_B || CHK_2_C then I'll check all m
s(in hierarchy) that can get called at that point.(CHK_2_B is meant to check methods that are accessing a stored instance of T
somehow but if instance can be stored as a super type, accessed and missed by the other checks then this check require more work. bit late, will check it later)
Then there are generics.
Can I see this:
// this
void f(A a, B b) {}
// as
void f<T extends A, V extends B>(T a, V b) {}
// and vice-versa
// ----------------------
// this
void f<T, V>(T a, V b) {}
// as
void f<T extends Object, V extends Object>(T a, V b) {}
// and vice-versa
// ----------------------
// this
void f<T extends Object, V extends Object>(T a, V b) {}
// as
void f(Object a, Object b) {}
// and vice-versa
// ----------------------
// this
class ToStringer<T> {
String toStringer(T value) => value.toString();
}
// as
class ToStringer {
String toStringer(Object value) => value.toString();
}
// and vice-versa
Doing this for ToStringer
might require something more if I can access T
as whatever type it's bound to. And something like void f<T extends A implements B, C>() {}
definitely require venturing into one more uncharted territory :/
I'm not bothered by the other 97-99%. There definitely are places where I could've done better, like way better than what I actually did. Sometime I just fail not to make stupid mistakes and then things stop being worthy of any effort. But this 1-3% are the characters that no-one on this planet need and compiler knows that.
The compiler doesn't know that because some developers do need the name.
We can't minify the name. Developers have come to expect the current behaviour of Enum.toString
, and the name is also available to general code via .name
.
There before we added .name
, we would see folks doing enumValue.toString().split('.').last
.
The compiler doesn't know that because some developers do need the name.
We can't minify the name. Developers have come to expect the current behaviour of
Enum.toString
, and the name is also available to general code via.name
. There before we added.name
, we would see folks doingenumValue.toString().split('.').last
.
How about minifying when there's no expression with static type Enum.toString
or Enum.name
?
But I do stand corrected, there is a possibility that someone might be using toString.split
on Object
in this planet but I guess they'll be okay with a little surprise. I did write thousands of lines of code in development mode using '$SomeClassName'
at some point and then got surprised when that didn't work in production mode. I definitely don't see who is to blame here, the person who want enumValue as Object).toString.split
or the person who give them what they need in a way that screw everyone else up :/
Also, a simple flag is always an option.
Took a deeper look today.
Including a new flag to -04
will probably break more code than I initially thought(Flutter is doing toString.split
on dynamic
). Not including the flag in -04
or adding -05
just for minifying enum names, is too much. Pragma directives (e.g dart2js:minify
, dart2js:no-minify
) will work but everyone(including compiler) will be maintaining them for who knows how long. The worst part is that, with these options, users will be forced to keep minified names even when they aren't using them(because names are required by Enum.toString
). Or in other words, they'll still be paying for others, less than what they're paying now.
Looking at this from a different angle, there's nothing that prevents the compiler from completely eliminating names if they aren't required by toString
. If a user is using names(directly Enum.name
or through compareByName
), they will appreciate that not only compiler kept names but also prevented them from getting minified.
So what's actually required:
toString
or Remove Enum.name
from toString
.compareByName
. These changes are going to break a lot of code so probably isn't a good idea. Maybe do this in future or maybe don't. This is the case where I can afford building SDK with changes I want/can-make.
This issue can be closed. thanks (:
I can share accurate numbers now:
# with names
Compiled 14,721,529 input bytes (9,766,634 characters source) to 1,207,213 characters JavaScript in 8.18 seconds using 450.852 MB of memory
# without names
Compiled 14,721,377 input bytes (9,766,581 characters source) to 1,181,984 characters JavaScript in 8.06 seconds using 467.418 MB of memory
These are results of dart compile js -O4
and there are 238 part files. If you're thinking why so many enums in such a small application, well I'm writing templates in HTML and templates weigh far more than Dart output but templates are compiled to many .JS files(just like part files) and loaded on-demand(just like defer loading in Dart).
I can share more information but I doubt you'll need that(--enable-analytics
was on, in both SDKs).
I removed these two lines:
Also,
- IS_REQUIRED
T.toStr
=> CANNOT_CHECKT
|| CHECK_1T.toStr
|| CHECK_2T.toStr
|| CHECK_3T.toStr
- CHECK_1
T.toStr
=> there exists a static expression of typeT.toStr
- CHECK_2
T.toStr
=> there exists a cast(implicit/explicit) fromT
toT0
&& IS_REQUIREDT0.toStr
- CHECK_3
T.toStr
=> there existsT1
that inheritsT.toStr
implementation && IS_REQUIREDT1.toStr
- CANNOT_CHECK
T
=>T
isdynamic
||T
is escaping program(things like passed to externals)
Don't know why but these checks don't look as stupid as I initially thought.
The whole program(or should I say all methods that are declared by user) exists outside Object
. Not only that, these simple checks will eliminate Enum.toString
from example in this issue, and from my codebase too(along with 1200+ enum names) even if I don't give a proper treatment to generics(just keep T.toStr
when T
is passed to generic something).
Either I've no idea what I'm saying or it's not hard, even when you're required to do it in the first attempt 😄
Also, feel free to close it anytime. Thought I should at least share the correct information.
This should do the job, precisely.
// look at this
class Object {
String toString();
String runtimeType();
}
// as
class Object_toString {
String toString();
}
class Object_runtimeType {
String runtimeType();
}
// -------------------------------------------------------
class A extends Object {
void f1() {}
void f2() {}
String toString() {}
}
// as
class A_f1 {
void f1() {}
}
class A_f2 {
void f2() {}
}
class A_toString extends Object_toString {
String toString() {}
}
// -------------------------------------------------------
void useF1(A a){
a.f1();
}
// as
void useF1(A_f1 a) {
a.f1();
}
// -------------------------------------------------------
void useF1F2(A a){
a.f1();
a.f2();
}
// as
void useF1F2(A_f1 & A_f2 a){
a.f1();
a.f2();
}
// -------------------------------------------------------
void useToString(Object o) {
print(o.toString());
}
// as
void useToString(Object_toString o) {
print(o.toString());
}
// -------------------------------------------------------
void useNested(A a) {
useToString(a);
}
// as
void useNested(Object_toString a) {
useToString(a);
}
// -------------------------------------------------------
void useEverything(dynamic o) {
sendToMars(o);
}
Now run these checks, one more time:
- IS_REQUIRED
T.toStr
=> CANNOT_CHECKT
|| CHECK_1T.toStr
|| CHECK_2T.toStr
|| CHECK_3T.toStr
- CHECK_1
T.toStr
=> there exists a static expression of typeT.toStr
- CHECK_2
T.toStr
=> there exists a cast(implicit/explicit) fromT
toT0
&& IS_REQUIREDT0.toStr
- CHECK_3
T.toStr
=> there existsT1
that inheritsT.toStr
implementation && IS_REQUIREDT1.toStr
- CANNOT_CHECK
T
=>T
isdynamic
||T
is escaping program(things like passed to externals)
I wish I could explain more precisely but I think you'll understand what I'm trying to say.
Update after issue got closed:
The approach above is effective(took help from a professional to verify that). And now I know that I can implement it efficiently(there will be some additional work in few cases and a simple fallback strategy for parameters that I can't proceed further with). They did show me few missing pieces(but those are obvious ones). I did show them the way I see generics and kind of generics supported in Dart and it turns out that there's some work required in that part(access to type arguments in body etc but it's do-able, they also mentioned some exceptions that require proper handling if I want surgical percision but those exceptions are few core types).
So to anyone who is still confused, the problem(tree shaking toString
) is/was solved.
The same professional felt sorry for all of you, they really did, and told me to find some place else.
Thanks @hamsbrar for the additional discussion and sharing the numbers. Marking it now as closed :)
Ayi-yi-yi you forgot the label this time (:
Now it's looking good. thanks again.
Output of
dart js compile -O4
contains:Same on the newest dev release(
3.3.0-84.0.dev (dev) (Tue Oct 31 17:06:17 2023 -0700)
).Also, not using
Enum.name
anywhere but it's part of the output, appears to be required bytoString
: