dart-lang / language

Design of the Dart language

Other

2.65k stars 203 forks source link

Provide a way to gather static initializers #369

Open davidmorgan opened 5 years ago

davidmorgan commented 5 years ago

Some frameworks need a way to run initialization code for all leaf code using the framework. For example, dependency injection frameworks want to build a map of injectable types, and serialization frameworks want to know about all serializable types.

Currently frameworks either use codegen or manual initialization to do this.

I discussed with @eernstg and @sigurdm; the idea we got to was that it might be possible to build this on top of existing const canonicalization support, giving something that's both cheap to implement and sufficient for the use cases we care about.

Leaf code wanting to register itself would declare a const in some new special way:

@register
const unimportantName = 'value';

@register
const otherUnimportantName = 'otherValue';

When the framework is initialized it can ask for all @registered const values and iterate over them doing whatever initialization work it wants to.

  var registrations = Platform.registeredConsts(); // <Object>['value', 'otherValue']

This is sufficient for e.g. all injectable types or serializable types to register themselves. Note that the value registered could be any const, e.g. a function, or an instance of a const class. A possible refinement would be to make it possible to ask for all registered consts of a particular type:

  var registrations = Platform.registeredConsts<String>(); // <String>['value', 'otherValue']

--arguably this removes the need for the register annotation since you can request a framework-specific type that is only used for registration.

jakemac53 commented 5 years ago

The implications on treeshaking and modular compilation should be carefully evaluated here.

For treeshaking, any app using these methods would have to retain all annotated consts, regardless of if they are ever actually used.

For modular compilation, any library using these methods now has a dependency on effectively all transitive libraries (any library could add a new const at any time).

matanlurey commented 5 years ago

Angular was originally built this way, and it was a huge [expletive] mistake.

I understand it "feels" nice to be able to do aspect-oriented programming like this, but it does not pay off when you care about code-size and initial startup, as Dart's users do.

eernstg commented 5 years ago

I've written an outline of a feature that I've put together, #371, based on the discussions that we had at the summit, and input from several others.

It uses a new kind of declarations Set<T> S of const; that goes into the framework (say, in library L). Each contributor to S must be in a library that imports L, and then it can do const S.add(c); where c must be a constant. S is immutable but not constant, and the main point is that the framework can get access to a bunch of values without depending on all the libraries that those values came from.

There is no control other than "exists in the program". So if you import L1 anywhere and that library adds something to S then you will have it in S at run time (there is no notion of tree shaking or such, and it's not easy to say how you'd achieve that; so take care).

It's up to some code in main (or something called from there) to make anything "interesting" happen: The #371 feature only allows us to gather a bunch of constants into a set. They might be functions used to perform some initialization, and some framework function initialize(...) could then be called in order to perform that work at an appropriate time. Again, some manual coding is needed in order to allow the work to take place in a lazy manner, but that's all done in user-written code, so the framework developers can set up whatever mechanism they want.

371 covers link time maps as well. I suspect that it wouldn't be much harder to support than link time sets, and this might be a useful basic "mechanism" in the above sense.

sigurdm commented 5 years ago

I agree that it would be breaking modularity, and I in general feel iffy about the fact that the presence of an 'import' can change the semantics of a program.

Wrt. modularity: One solution for DDC would be to generate a static initializer for each library that registers global consts and is run by before main at program startup time (I suppose it already does something like this for constant canonicalization).

The advantage of this compared to general static initialization is that const evaluation has no side effects.

Angular was originally built this way, and it was a huge [expletive] mistake. Can you elaborate on this? What where the exact things that went wrong for Angular?

davidmorgan commented 5 years ago

Thanks Erik! That looks like a good fit.

Re: tree shaking, I expect that frameworks will provide custom tree shaking code. This is already being looked at for protos. My intuition is that a whole program kernel transform is the correct place to do this work.

Re: modular compile performance, these collections should only be accessed in framework initialization code, which should have a 1:1 correspondence to app entrypoints. Compilation performance should be better than what we have today--because today we are implementing something like this on an ad hoc basis. The compiler should be able to do it more quickly since it already does this work for const canonicalization.

Re: angular. Code size should be addressed by custom tree shaking; startup time needs consideration. If the initializer list is "flat" then startup time will necessarily be proportional to the list size. But the proposal allows for a map; if this is set up so initialization can be done lazily based on the key, then I think startup time should be okay.

I think a good next step for the proposal would be for CFE folks to check if my hand waving about "custom tree shaking code" is at all feasible :) ... I'm not exactly sure what form this would take, but protos are a good example. e.g. perhaps the app needs to supply some additional data at compile time to help the tree shaker figure out what to keep; it could for example say "these packages are the feature code; tree shake const initialization exactly for protos not mentioned here".

eernstg commented 5 years ago

@matanlurey wrote, and @sigurdm commented on:

Angular was originally built this way, and it was a huge [expletive] mistake.

I'd also like to hear more about what went wrong.

Note that the mechanism in #371 will not give rise to additional space for constant objects (every object which is added to a link time set/map is obtained from a constant expression, and presumably you'd need that constant anyway), so the only added space consumption is for some sets/maps to hold those values. Also no computation takes place before main is executed, except for add on a platform set type and operator []= on a platform map type. No user code runs. So the mechanism in itself should not be expensive, and certainly not unpredictably so.

We could store anything in those sets/maps as long as it is a constant, but I'd expect things like functions (that some initialization code in the framework would call, when appropriate) or objects that could be understood as commands (so maybe there's a list of k values, a list of k types, and a function, and then the initialization code should call that function as with those arguments, that is, f<Tj>(vj), with j in 1..k). Such commands and their "interpreter" in framework the could be tailored to allow any kind of laziness that the situation calls for, so we shouldn't need to change the link time sets/maps themselves in order to avoid having a long start-up time.

Was there something in the original approach taken by Angular which would still be inherently difficult to handle?

sigurdm commented 5 years ago

re: tree-shaking I have been speculating in a feature that would allow marking an expression a "weak" reference wrt. tree-shaking.

Meaning that it would not count towards keeping objects alive, and would return null if what it referenced was tree-shaken away (because it is not referenced anywhere else). I don't know exactly how such a thing could be formalized (currently treeshaking does not have observable effects on program semantics), and not sure how easy it would be to use.

eernstg commented 5 years ago

@sigurdm wrote:

re: tree-shaking

In this context I envision a typical situation where the link-time sets and maps are rather small, because they would typically contain a bunch of functions to run, or a bunch of "command" objects (again specifying some computation). Running those computations might be very expensive, but it should be possible to partition the tasks into manageable chunks and running them on-demand. Couldn't this enable the same kind of resource usage improvements as a tree-shaking feature? We wouldn't run a command before we need the data that it produces, so if it's never needed we've "tree-shaken" it away.

Of course, it might be a complex task to partition the "start-up" computations and perform them lazily at the right point in time, but it would be done in regular Dart code so we can use whatever techniques we can think of.

yjbanov commented 5 years ago

@eernstg

presumably you'd need that constant anyway

This assumption is wrong, and it is where the Angular approach fell apart. The issue was that a library would statically declare 10x@Injectable and/or 10x@Component and/or 10x@Directive classes. However, of those 30 classes only one would be used by a downstream application. Since all these classes were aggregated into a top-level Map<Type, TypeFactory> the tree-shaker couldn't remove any of the classes (TypeFactory is a function that instantiates the class). The map could be iterated over and TypeFactory called.

Note that there was a similar attempt in the past by @sigmundch (but never got adopted). The idea was to provide a new kind of map, LookupMap, which could only be const and only allowed key lookups. Iterating over the map would be banned. The tree-shaker would compute the union of all keys that could possibly flow into this map and remove unused key/value pairs. This gives you a semblance of having reflection into a library but in a tree-shakable way.

matanlurey commented 5 years ago

This should straight up and simply not be done. It is a mistake. Interestingly enough, I don't see much of a problem statement here, only a solution. We've talked to lots of teams both externally and internally before who thought they wanted this, only to be (easily) talked out of it when the costs and benefits were weighed out.

(While it seems simple enough at constants, ultimately teams will want to register non-const objects and functions, and then all assumptions are completely out of the window as you have accidentally implemented mirrors by another name)

@davidmorgan:

Re: tree shaking, I expect that frameworks will provide custom tree shaking code. This is already being looked at for protos. My intuition is that a whole program kernel transform is the correct place to do this work.

This is a terrible expectation. Writing a generator is tough enough, expecting teams and frameworks to write global analysis steps is, full-stop, a non-starter. The only reason it works for protos is because of the small scope of the project (only for the VM, and written/maintained by the Kernel/VM team).

Re: angular. Code size should be addressed by custom tree shaking

Sorry, this is just a really bad assumption and idea. I don't have a better way of stating it. We spent literally 5 years thinking of different ways to mitigate the cost of collecting metadata and decided the only way forward was to deprecate it (initReflector()).

@eernstg:

I'd also like to hear more about what went wrong.

There is a bit of history in these docs:

I'd be happy to GVC/meet in person to go over details.

mkustermann commented 5 years ago

I don't see much of a problem statement here, only a solution.

This discussion originated with a very concrete problem: Decoding protobuf messages with extensions, where the decoder doesn't know about which extensions are available. See b/131893790 for further details.

Writing a generator is tough enough, expecting teams and frameworks to write global analysis steps is, full-stop, a non-starter.

That depends, on the project, the team and the benefit it brings. If there is a huge savings in size or increase in performance, a "full-stop, non-starter" becomes "the right thing to do".

Given that Angular seems to have a similar problem: What solution do you currently use? What solution do you recommend?

matanlurey commented 5 years ago

This discussion originated with a very concrete problem: Decoding protobuf messages with extensions, where the decoder doesn't know about which extensions are available. See b/131893790 for further details.

IMO, this is a flaw in how the protobuf library was originally developed, not in Dart. For example, @yjbanov's original gAPI-compatible "Streamy" library (now deprecated) never had this problem, if you didn't use certain entity objects (read: protos) or fields, the information about decoding/encoding them was tree-shaken. This was back in Dart 1.x to boot.

(As far as I can tell, the JSPB/JS-Closure solution internally also supports tree-shaking w/o hacks)

Given that Angular seems to have a similar problem: What solution do you currently use?

A combination of things:

We accept we can't do extremely fine-grained tree-shaking.
We tell teams to import and use exactly what they need, instead of relying on the framework.

As how this relates to protocol buffers, I am not sure there is an easy fix. I know @ferhatb and others have talked about having a "Nano Proto" or "Proto lite" library similar to what was developed for Java internally, and I'm a huge proponent of this idea.

sigmundch commented 5 years ago

I think it's worth exploring whether there is a protobuf specific design for the problem of extensions, first. For example, as @jakemac53 suggested elsewhere, could decoding be more lazy when it encounters extensions?

Indeed, if we do end up considering static initializers, we need to be extremely cautious about tree-shaking. I also want to share a couple thoughts from previous approaches:

initialize package: we once had a package that added static initializers as a transformer. The logic is now outdated (it was based on the barback-transformers that are now deprecated), but it should illustrate that approach: methods annotated with @initMethod were called at the beginning of main. We created it to support polymer, which needed actual code to be able to register custom elements before starting the app. This work suffered from the same drawbacks mentioned earlier on this thread: annotated methods couldn't be tree-shaken, it required scanning the entire app to generate a list of initializer calls and invoke them in sequence, and new imports and other unrelated changes were able to affect what was called and in what order.
lookup maps: which @yjbanov mentioned earlier, were tree-shakeable const maps. They worked under the assumption that keys could not be retrieved from the map (these maps didn't support iterating them) of fabricated at runtime. That ensured that a compiler could statically determine if a key was used anywhere else in the program, and if not, the corresponding row in the map would be deleted. I think this approach won't support protobuf extensions because I believe extensions keys are just numbers or something that can be fabricated at runtime, so we wouldn't know whether something could be tree-shaken.
I'm less famliar with Angular's inner workings, but it's worth highlighting that their dependency injection used to rely on generating initReflector initializer methods called recursively from main. Recently they have pushed customers to be more explicit about their dependencies and use runApp instead. My understanding is that the former allowed you to create a registry that couldn't be tree-shaken, hence projects noticed at least a 10% code-size reduction when switching to use runApp.

pulyaevskiy commented 5 years ago

A very raw idea but a similar UX can be achieved without introducing reflection-like features by making build/codegen more tightly integrated with the SDK. Say, running dart myapp.dart would be able to trigger a build step before compiling. Or some other combination with similar effect. dart2js could follow similar path.

I have no idea about SDK-level implications of this, but the goal in this case is to make codegen seamless for the user so that it becomes a no-issue.

matanlurey commented 5 years ago

Thanks @sigmundch for the background there!

@pulyaevskiy: This isn't related to code generation being idiomatic or not.

pulyaevskiy commented 5 years ago

@matanlurey I'm sure you have more context on this subject but wondering if you could elaborate on why you think it's different?

Just to quote problem description from the original post:

Currently frameworks either use codegen or manual initialization to do this.

The above tells me that the main reason for this requests is that there is a required manual action needed for frameworks to compose their registries and we'd like to automate it. In this case by introducing reflection. I just wanted to point out that if we can automate this manual action then it'd become a non-issue. Though I'd defer to @davidmorgan to either confirm or deny my assumption, since he's driving this.

eernstg commented 5 years ago

I wrote, and @yjbanov responded:

presumably you'd need that constant anyway

This assumption is wrong, and it is where the Angular approach fell apart

That's obviously a very important experience. So that assumption was wrong for Angular, and it sounds like the situation ("the program imports many classes of this kind, will use a small subset of them") could easily occur in other large systems as well, especially in situations where those link time collections contain Type instances (anywhere, as elements of a set or key), or other objects that depend on those "many classes" in any way, directly or indirectly (such as a function or ComponentFactory or similar which "knows" a type that we might not use).

So there is no danger in using a link time collection to hold a bunch of objects with no/few dependencies, but it does not scale up when they have any "potentially shakeable" dependencies.

LookupMap .. tree-shaker would compute the union of all keys that could possibly flow into this map and remove unused key/value pairs.

If that idea does actually work (even though it wasn't adopted previously), and if it can be made sound ("couldn't possibly" must be true) then it could directly be used for link time maps.

But do we then conclude that a mechanism like link time collections is never useful in practice? Or do we conclude that it doesn't scale in some situations, but it is actually useful in other situations, and we can develop a sufficiently effective culture of how to use it?

@matanlurey wrote:

This should straight up and simply not be done.

There's a clear message! ;-) And, again, it's obvious that there's a lot of experience backing up this conclusion.

Matan, trying to get at your core reason for this message, I noted the following:

accidentally implemented mirrors ... [we tried to] mitigate the cost of collecting metadata and decided .. to deprecate it

My interpretation is that your main concern is similar to Yegor's: You're convinced that this mechanism does not scale up, because it will inevitably cause programs to hold on to a large number of unused entities.

There is a bit of history in these docs

Thanks, that was helpful!

About how Angular currently handles this issue:

We tell teams to import and use exactly what they need, instead of relying on the framework

For the case where a ComponentLoader is used rather than a SlowComponentLoader, this seems to rely on generated code (*.template.dart) and on an explicit reference to MyCompNgFactory rather than passing the Type (MyComp) to the loader.

Does this imply that your advice would be to always use code generation in the situations where the naive developer would otherwise be tempted to use a link time collection? ;-)

@sigmundch wrote:

we once had a package that added static initializers as a transformer

Sounds like the use of that package could very easily cause the same kind of tree-shaking problems that we have on the table here already.

But if it were ever appropriate to use that package then I would assume that link time collections could be used to achieve the same result (by collecting a set of initializing functions into a link time set; the only manual step would then be to add callStaticInitializers() to main). This would then allow us to eliminate one reason to have a code generation step, which might be important if nothing else forces such a step to exist.

@pulyaevskiy wrote:

the main reason for this requests is .. a required manual action .. to compose .. registries and we'd like to automate it. In this case by introducing reflection.

With something like link time collections there would still be a need for a manual action: main must (directly or indirectly) do something with the constant objects that have been gathered (e.g., if they are functions then main would need to call something like initFrameWork() that knows how to call those functions).

So the point would be that there is no need for the developer who writes an application to know which ones of all the libraries that are imported (directly or indirectly) have which "things" that need to be gathered: That's something that the relevant library/package developers will register (e.g, using something like const staticInitializers.add(myInitializer);).

In that sense it's more about reducing the amount of "administration" that the application developer is required to perform (that is: it's more about improving the encapsulation of libraries-that-register-something) than it is about avoiding a tiny manual step (which is simpler, because it's per framework, not per library).

But if we have a language construct (like const myLinkTimeSet.add(c)) then we could actually connect it to the tree-shaking machinery:

We could make this mechanism dependent on the outcome of the overall tree-shaking process, using something like the following:

// Specify the classes/functions that will enable the addition of `c`.
const myLinkTimeSet.add(c) if MyClass, myFunction;

The effect would then be that c, MyClass, myFunction (etc. going over the specified if list) would be ignored for the purposes of building up the database of which entities in the program are being used (that is, "during tree-shaking", even though it's actually upside-down), and at the end of the regular tree-shaking process it would be noted whether or not any of MyClass, myFunction (etc.) will be included, and if any of them is used then c will be added (plus derived dependencies).

This implies that tree-shaking would be extended from the current approach to iteratively add each contribution to a link time collection, but that algorithm already needs to propagate dependencies transitively.

In any case, the choice of dependencies is sound in the following sense: If c is actually included then the regular tree-shaking algorithm will ensure that all dependencies thereof are included as well. And if it is not added then c will not be added to myLinkTimeSet at all, and hence it's correct to skip it during tree-shaking. So if that list of triggers is incorrect then c may be missing from myLinkTimeSet, but the overall consistency of the program is always preserved in the sense that all reachable classes/functions will be loaded.

So maybe that would allow us to avoid some custom tree-shaking, and still have our cake? ;-)

yjbanov commented 5 years ago

@eernstg

But do we then conclude that a mechanism like link time collections is never useful in practice? Or do we conclude that it doesn't scale in some situations, but it is actually useful in other situations, and we can develop a sufficiently effective culture of how to use it?

I would not conclude that based on my experience. Since we didn't get a chance to put LookupMap in practice, it's hard to say how well it could scale. Link time collections might actually work. My intuition is that it is possible to come up with a design that works. As @sigmundch points out, this concept relies on a few assumptions. If those assumptions could be guaranteed at the language level, e.g. only allow const key reads/writes, disallow iteration, disallow key/value enumeration, then this approach might be effective. Of course, it would also have to be not so constrained as to be useful :)

eernstg commented 5 years ago

@yjbanov wrote:

Since we didn't get a chance to put LookupMap in practice, it's hard to say how well it could scale.

I discussed LookupMap with @johnniwinther, and I got the impression that they could very well come back. However, I think link time collections are in some ways a generalization of LookupMap, because they are "distributed":

// Using LookupMap.
const map1 = LookupMap([A, 1], [B, 2], [C, 3]);

// Using a link time map.
Map<Type, int> map2 of const;

const map2[A] = 1 if A;
const map2[B] = 2 if B;
const map2[C] = 3 if C;

The point is that the declaration of map2 can be in a centralized, well-known location (for instance, dart:ui) and the contributions (the const map2[..] = .. declarations) can be in other libraries as long as they can see map2. This enables other code to depend on map2 without depending on A, B, and C, whereas any code that has map1 in scope would also have to have an import path to A etc. This means that developers using LookupMap will need to create and maintain more intricate import graphs; for example, map1 can never be declared in dart:ui if it is intended to contain entities declared outside the core framework of Flutter, but map2 can do that.

The tree shaking properties could be the same: With LookupMaps, if no expression in the program can yield the Type for A then map1 would not contain that entry. Similarly, const map2[A] = 1 if A; is ignored if A is not used. (I haven't spelled out any details of what it means to be "not used", just that it will be able to rely on the existing tree shaking mechanism, but it should certainly be possible to make it work like LookupMaps here.)

The LookupMap can also add a whole LookupMap in one step, but if that turns out to be crucial we can also have a const myLinkTimeCollection.addAll(c) if ...; declaration for both sets and maps.

LookupMaps also seem to allow overwrites, but I suspect that we'd want to differ here and keep it an error for a link time map to attempt to set two values for the same key.

davidmorgan commented 5 years ago

Thanks everyone.

Dependency management is hard. Automatically pulling in dependencies errs on the side of correctness, at the risk of bloated code. Manually maintained dependencies can be wrong in both ways: there can be unneeded or missing dependencies. I've also seen automatic dependencies with manual blacklisting for codesize.

FWIW, built_value went from automated (scan the whole program looking for serializable types) to manual (explicitly list serializable types). This was largely because the automated type collection was hard to reason about and the implementation of it was hard to maintain.

My guess is that the best compromise between performance and usability is to end up with a checked in list of dependencies which is automatically updated. e.g. you add a new serializable type and your presubmit says "add it to the list of serializable types (y/n)?". The configuration also needs to blacklist things you don't want to be serializable (without referring to them!), for the benefit of the presubmit.

We actually could do this with the language feature, by forcing you to say which code you're keeping and which code you're dropping in order to gain access to the link time set/map. Kind of a weird language feature :)

  const map = linkTimeMap.materialize(
    keep: ['package:foo/foo.dart', 'package:foo/bar.dart'],
    drop: ['package:bar/baz.dart'],
  );

sigurdm commented 5 years ago

I think this approach won't support protobuf extensions because I believe extensions keys are just numbers or something that can be fabricated at runtime, so we wouldn't know whether something could be tree-shaken.

I think we could use it (or something similar) for protobuf extensions, they are represented as a kind of descriptor-objects: as here: https://github.com/dart-lang/protobuf/blob/master/protoc_plugin/lib/src/dart_options.pb.dart#L89

You acces the imports extension field of a message using GeneratedMessage.getExtension(): https://github.com/dart-lang/protobuf/blob/master/protoc_plugin/lib/file_generator.dart#L39

If that object is never referenced, we can assume that extension is not used.

eernstg commented 5 years ago

@davidmorgan wrote:

say which code you're keeping and which code you're dropping

If that's desired then we could use the conditional contributions that I mentioned here to express that.

// Make it an error to include the types `Foo` and `Bar`.
Set<int> _excludedTypes of const;
const _excludedTypes.add(1 ~/ 0) if Foo, Bar;

The basic idea is that the added expression must be potentially constant in all cases, but it will not be evaluated if Foo and Bar are not present in the set of classes in the program after tree-shaking.

Of course, we could declare some ExclusionOfType class with a constant constructor and an assert initializer to make this much prettier, but this shows the underlying mechanism that we'd be able to use.

Inclusion of certain types in the program (based on tree shaking) could be achieved by any usage of the desired types which doesn't get optimized away.

main() {
  // Force inclusion of the following types in the program.
  var useTypes = [foo.Foo, foo.Bar];
  ...
}

Again, we'd probably need a twisted statement or two to make sure that useTypes doesn't get optimized away, but I'm sure we could find a suitable way to abstract this away such that the forced inclusion can be achieved in a readable manner.

Forced inclusion may or may not be needed: For instance, if a particular program is capable of creating and serializing instances of a set of classes M then tree-shaking will yield a program that includes those classes (because code creating an instance C(...) is reachable from main for each C in M); but if a program expects to receive a serialized representation of instances of classes in a set M then for each class C in M it might well look up a deserializing function for C in a link time collection. It would then be a business level "contract" with the sender of that serialized data which classes M should contain, and the inclusion of each class in M would have to be forced. Reachability analysis can't see that any of them could be needed, because that link time collection doesn't contain anything unless it is otherwise known to be needed; for instance, each deserializer could be contributed like const deserializers["C"] = deserializeC if C;, which will only have an effect if a usage of C is reachable—so we can put a usage of C in main to force it if we expect that a serialized C may be received, and then we can look up the appropriate deserializer in the deserializers map using the chosen key (e.g., the class name "C", or whatever it takes).

Note that we wouldn't need to worry about how an included type will be used, because (1) when foo.Foo is present after regular tree-shaking then (2) some const mySet.add(c) if Foo; will be enabled (or a similar map thing), and this causes another round of world impact computation to take place, such that everything which is needed in order to be able to use c will exist. And that would presumably be the class Foo in itself, plus some of its methods, plus maybe a FooNgFactory, etc.

davidmorgan commented 5 years ago

Thanks Erik.

I think what you are describing fits the built_value use case, in that you might want to deserialize a type that is never mentioned in the code. This happens because Foo, which is mentioned in the code, can have alternative implementations e.g. GoldenFoo and SilverFoo which are never mentioned. Each needs initializing if it's going to be supported at runtime.

I'm not sure if there is a comparable case for protos; extensions are different and I don't know how they're used in practice / how they relate to client code: https://developers.google.com/protocol-buffers/docs/proto#extensions

Getting back to the proposal--I think there are three types of code change that we'd like to be able to 'catch', and by that I mean, highlight to the person making the change and the person reviewing it. Forcing a change to where the link time map is accessed, explicitly listing/rejecting types, would be one way to do this.

1) An actual dependency is added on a surprising initializable type. User starts using type SmallTimestamp and doing so causes BloatedTimestamp to be a non-tree-shakeable dependency of the app. When this is highlighted the change is rejected and a bug filed against SmallTimestamp--fix your dependencies.

2) A dependency is added on a new initializable type which remains tree shakeable; it's wanted. User starts using Foo, which pulls in GoldenFoo's initialization code but does not otherwise cause it to be used. Use knows that GoldenFoo is a type of Foo they might want to deserialize, and so wants to mark it so it does not get tree shaken.

3) As 2, but this time not wanted. User starts using Foo which pulls in SilverFoo's initialization code but does not otherwise cause it to be used. User knows that SilverFoo is a type of Foo that will never be used in their code, they want it to be tree shaken.

eernstg commented 5 years ago

@davidmorgan wrote:

An actual dependency is added on a surprising initializable type

If initializable types are written in such a way that each of them, if not tree-shaken away, will add itself to a specific link-time collection initializables, then it might be enough to check that collection for unexpected elements. A very simple approach could be to check the size:

main() {
  // If this fails, check for unwanted elements in `initializables`.
  assert(initializables.size == 31);
  ...
}

More precision could be achieved by comparing the set (or the keys if it's a map) to a manually written set of expected classes, or check that initializables is a subset of an approximation from above, etc.

3 .. User starts using Foo which pulls in SilverFoo's initialization code but does not otherwise cause it to be used. ... they want it to be tree shaken.

For soundness it would not be possible to omit something which is considered reachable by the tree-shaking algorithm. So this sounds like some author of the code of Foo (or the developer who wrote the c in a contribution declaration like const mySet.add(c) if Foo;) somehow introduced a spurious dependency link that reaches SilverFoo in some number of steps.

I believe this could be addressed using the same kind of techniques as those for bullet 1. So when the use of Foo is added to the program, the set of included classes (of some kind) grows, and if they are intended to register themselves with some kind of a link-time collection myCollection then the value of myCollection before and after the change can be inspected (just print it at the beginning of main), and unexpected changes can be explored in order to ensure that dependencies are added for exactly the entities that we actually want to have.

davidmorgan commented 5 years ago

Good point, we can write a test that sits next to main and asserts on the contents of the map/set. It's a little awkward to get right; the test needs to arrange that tree shaking for the test happens exactly as for main.

But I expect we could come up with a working/workable pattern to recommend.

For me this is a good approach to the problem: collect the configuration for the whole app in one place, and make whatever asserts on it you like, in a test. It's true that if you don't follow best practices and have such a test, your app risks ending up bloated; but I think that's true of all answers we have for this problem, including fully manual configuration.

I'm curious as to what others on the thread think.

eernstg commented 5 years ago

FYI: I just added the ability to rely on tree-shaking to the proposal in #371.

icatalud commented 4 years ago

If someone could summarize the difficulties with adding an initialization mechanism it would be appreciated (discussion is very long). The following is what I have wished more than one time:

default A.initialize();

class A {
  static initialize() {}
}

Essentially I would expect dart to invoke all packages defaults before running the main. Is there any important consideration why something like that cannot be included?

Since there is “lazy loading”, this could be think of this like “proactive loading”. Alternative wording could be “expedite” or “proactive”.

Currently I’m only able to achieve initialization through a “unhygienic” hack.

eernstg commented 4 years ago

The issue which is usually causing problems with the model where each module (whatever notion of module a language might have) has one or more initialization code blocks that are executed implicitly before main runs is that there is no safe notion of ordering. Initialization code usually has a lot of side effects, and it may cause crashes (or, worse: subtly different results) if they aren't executed in a "good" order.

icatalud commented 4 years ago

Thanks, I understand better now.

Short answer

Resolve defaults in the same way Python resolves module imports, but through static analysis (even better). If there are default method invocations circularity, it is forbidden. The orders are resolved by the library import order, which is not ideal (because import order matters), but it is a simple solution and I’m pretty sure that default methods writing the same variables would be fairly uncommon. Being practical, keep in mind Python works like this and is one of the most popular languages in the world.

It is possible to constraint initialization more by allowing at most one initialization invocation per library. Constraint it even more by allowing using only native var types and only writing uninitialized vars defined within the scope of the library. Think of it like a class constructor where the vars are initialized but for the scope of the library (the vars are the "scope class" members).

Whichever mechanism is chosen should be simple and direct, keeping in mind that most use cases will not conflict with most of the problems that arise from initialization which are the edge cases reached by an exploring minority.

Long deliberation (initialization trick)

The problem is that initialization is currently already possible, it's just that it is not elegant. For example, I do the following:

_initialize = () {
  // perform initialization here
  return null;
}();

// Somewhere else in the code
class A {
  final x;
  A(): x = _initialize ?? 0;
}

I have no idea when exactly _initialize is evaluated, if it is preemptively or lazily evaluated, I just know that it is working for me. _initialize is just included so that it is not branch cut.

That kind of initialization does give the compiler a notion of ordering which I presume is the Pythonic one explained above.

Given that, I would expect the default invocation to have implicitly the same behavior as it would be to place the above trick over all members in a library (including everything that comes from the library, also types). Like var libMember = default ?? memberValue. This means that if the whole library is not branch cut, it’s initializer should be theoretically invoked. Currently I’m assuming there is a predictable established order rule to resolve the following:

library lib1;
var x = 2;

// Used in class A constructor
_initA = () {
  x += 1;
}

// Used in class B constructor
_initB = () {
  x *= x;
}

When x is accessed, which value will it have 2, 3, 4, 5 or 9? There is no way to intuitively know this (there are different valid intuitions). If class B is accessed first in the code, is _initB executed first? Most developers are probably clueless about how the compiler resolves these cases. Any default execution mechanism is better than the _init trick which is not transparent because the value of x at the very least depends on the result of the branch cut, a default method would always be invoked if any of the vars it writes are not branch cut (the way it should be).

Now imagine this:

// imports lib1
library lib2;

_initA2 = () {
  lib1.x *= 3;
}

Whatever rule currently exists to resolve this should be considered inconsistent, since it’s too complex for a person to resolve. The fact that there is no easy initialization mechanism is preventing this odd behavior from frequently happening in libraries by hiding the problem, but the problem is still there.

Python approached this by transparently explaining exactly how the package dependencies resolve, I'm not sure how the cases above resolve in Dart.

Personal thoughts

Personally I do not like at all the fact that import order matters, because it's a hidden effect that cannot be predicted unless internal details of the packages are known, which breaks the package encapsulation ideal and it adds an asterisk to the import statement definition. But that's a wish, being realistic initialization is highly required and the short answer explained above is a decent solution, where the benefits outweigh the inconveniences, unless I’m missing something big. I'm building a library and I do miss an official var initialization mechanism. Package dependency order mattering shouldn't be thought of only as something negative, it also opens the possibility to transparently create "initialization libraries" which tweak values of mainstream libraries.

jakemac53 commented 4 years ago

Note that your initialization trick doesn't actually work because the closure is evaluated lazily, see https://dartpad.dartlang.org/d036b1c1e58bf0e5172c54ce0cc3d8e5.

icatalud commented 4 years ago

I have been using the trick in the constructor of the key classes that require the initialized variables. It can be done in any part of the code that is reached from the visible API of the library. That explains though how Dart deals with these values, the same rule could apply for any initializer. The initializer is run lazily as soon as the first member or type from a package is referenced. Assume an initializer is implicitly initialize ?? anythingOfPackage. I don't know if I prefer that kind of initialization over the order of packages one because I find the order of packages more transparent than order of code execution. Depending on the code execution order is also more inconsistent because of branching.

In any case, as of today, with lazy loading (python also offers lazy loading of imports) Dart can have initialization circularity errors. It's not different from actual initialization methods producing an error.

dartpad initialization error

jakemac53 commented 4 years ago

This issue was originally more around the use case highlighted in my dartpad example "frameworks need a way to run initialization code for all leaf code using the framework".

If you already know all the variables that you want to be "initialized" then the trick you provided is already available and is the recommended solution. Manipulating global state in an order dependent manner inside of those initialization closures should generally be avoided for the reasons you have highlighted.

icatalud commented 4 years ago

I wanted to open a request for initialization but I found this one, which seems more specific to me, unless I'm understanding something wrong. If there was some kind of initialization, the following could be done:

initialize {
  registrations = Platform.registeredConsts(); 
}

The initialization could be lazy evaluated as you proved it happens with the closures, but like I said, I think lazy initialization is less consistent.

The trick I posted is exactly why I wanted to ask for an official language mechanism to achieve this with readable syntax. I think initialization is a common enough requirement to deserve it's own language mechanism to do it. I prefer the package initialization resolution by import order over the execution dependent lazy evaluation.