dart-archive / dev_compiler

DEPRECATED - Moved to main SDK
https://github.com/dart-lang/sdk/tree/master/pkg/dev_compiler
Other
133 stars 27 forks source link

Proposal: a new scheme for module and library names #619

Closed nex3 closed 8 years ago

nex3 commented 8 years ago

After talking to @vsmenon and @jmesserly about the constraints underlying module and library names, I want to put forth the following proposal. I believe this should still work for build systems like Bazel while being substantially more usable and comprehensible by end users. It would also fix #612.

First, I propose we do away with the --module-root and --library-root flags. They're currently used to determine the names of modules and libraries within those modules, so we'll need an alternate way to determine those.

For module names, I propose that by default the dev compiler uses the basename of the target file, post-processed if necessary to make it a valid Dart identifier. So if the user runs dartdevc -o module.js file1.dart file2.dart file3.dart, the module's name would be "module". There would also be a --module-name flag that would explicitly declare the name of the module. This would allow users with large build systems to provide their own module naming scheme that would allow them to avoid conflicts.

This also means that the compiler will no longer be able to determine the name of a dependency module from the path to its summary combined with the module root. To fix this, I propose that the module name be included in the summary, either by passing information to the analyzer or by adding some sort of dev compiler-specific header.

For library names, we can take advantage of the fact that they only need to be unique per-module and that the dev compiler knows the URIs of all files in the module at compile time. Each library name is computed by taking the shortest unique final sub-path of the library's path, and converting that to an identifier. For example, if a module contained only the library lib/src/foo.dart, its name would just be foo. However, if it also contained the library lib/foo.dart, their names would be src_foo and foo, respectively.

Library names would also be explicitly definable by the user. When passing a library to the dev compiler on the command line, the user would be able to add a : followed by an identifier to declare the identifier for that library. For example, dartdevc -o module.js file1.dart:one file2.dart would use an explicit name (one) for the library defined by file1.dart, but the default (file2) for file2.dart.

Like module names, library names would be included in the summaries emitted by the dev compiler. The default library names could potentially be recomputed from scratch based on the library URIs, but that wouldn't work for explicitly-named libraries.

jmesserly commented 8 years ago

LGTM!!!

jmesserly commented 8 years ago

On second read, some questions on module-root:

For module names, I propose that by default the dev compiler uses the basename of the target file, post-processed if necessary to make it a valid Dart identifier.

we shouldn't need module names to be Dart identifiers. They're the strings we need to generate in the ES6 import:

import { baz } from "foo/bar";
// in node.js:  let baz = require("foo/bar").baz;

The typical convention is they're root-relative just like Dart's package:, with support for relative paths if it starts with a dot: "./foo/bar"

When we generate a module, we shouldn't need its own module name. We only need the name for imported modules. EDIT: although it's probably true that our "legacy" browser module format does need its self-name. Doh.

This also means that the compiler will no longer be able to determine the name of a dependency module from the path to its summary combined with the module root.

One idea is to allow passing in the import name

-s path.sum,package_modules/path.js

That would tell us what to generate for the ES6 import/require directive.

I was originally hoping for a convention where .sum and .js files are next to each other on disk. That's how it works in most languages/compilers: often the executable code+metadata is packaged in the same output file.

I'm not sure if there's any other conventions in the JS transpiled language space. I think they tend to avoid it because they typically put the transpiled JS file next to the input file by default, that way they can avoid relative path problems. Also the user expresses the import themselves, rather than having the compiler figure it out.

jmesserly commented 8 years ago

fyi - loader spec for browsers https://whatwg.github.io/loader/#browser-loader-prototype-@@resolve

it's not spec'd yet the resolution policy, seems possible we might be forced to use absolute or relative paths.

nex3 commented 8 years ago

I was mostly thinking about this from the perspective of the dev compiler's custom module syntax, which is pretty different from how ES6 modules work. ES6 imports involve locating the actual JS file on the server (or on disk for Node), whereas the custom module syntax requires that all modules already be loaded in the browser, presumably by the user including them in <script> tags.

ES6 doesn't really even have a notion of a module's "name". Modules just have URLs. So the question is, what's the relative path from one JS file to another? I like your idea of allowing users to explicitly specify the paths from which to import dependency modules, but we should also have a sane default. I think assuming that all modules are in the same directory is probably the best we can do there; because modules don't need to correspond to the locations of Dart files on disk, there's not much we can infer otherwise.


For ES6 modules, I propose that the notion of a "module name" as an inherent attribute of a module be abandoned entirely. Instead, when a summary path is passed on the command line for the compilation of a downstream module, the user will be able to associate an import path with it with the syntax summary:import. This import path will be used as-is in the from directive for importing that summary. For example, if I ran dartdevc -o app.js -s build/lib.sum app.dart, it would generate the following import:

import { /* ... */ } from 'lib';

And if I passed dartdevc -o app.js -s build/lib.sum:../shared/lib app.dart, it would generate:

import { /* ... */ } from '../shared/lib';

I believe Node modules have the same semantics as ES6 modules for the purposes we care about here.


This means the module schemes for Node/ES6 and the custom loader would be fairly different. I can see two possible ways to resolve this:

  1. Support both of my proposals, but disallow the --module-name flag when compiling a Node/ES6 module, and disallow the : syntax for summaries when compiling a custom module.
  2. Drop support for the custom module loader entirely in favor of a more standard solution—probably either RequireJS or a polyfill for the ES6 loader.

Since I'm already proposing a fairly comprehensive overhaul of the module UI, I favor the second option. It also makes users' lives simpler, by using the same module UI everywhere and using a well-understood standard.

nex3 commented 8 years ago

One small addition to the proposal: I also propose a --sdk-import flag that defines the path from which the SDK module is imported. This would default to dart_sdk, as it does today.

jmesserly commented 8 years ago

Lovely write-up. I think that sounds reasonable.

[...] because modules don't need to correspond to the locations of Dart files on disk, there's not much we can infer otherwise.

if we were to use the relative path (instead of base name), it would be relative to the JS file/summary location. So it doesn't matter where the .dart file(s) were, but what structure was created via -o path/to/module.js. Since that's under user control, it doesn't seem like a bad thing to use by default?

(oh, another thought: serving the .sum files next to the .js is useful for development, in particular the REPL and features like that can use it. It's sort of like a super Dart source map)

nex3 commented 8 years ago

if we were to use the relative path (instead of base name), it would be relative to the JS file/summary location. So it doesn't matter where the .dart file(s) were, but what structure was created via -o path/to/module.js. Since that's under user control, it doesn't seem like a bad thing to use by default?

Oh, good point. That's definitely a better default.

(oh, another thought: serving the .sum files next to the .js is useful for development, in particular the REPL and features like that can use it. It's sort of like a super Dart source map)

:+1:

jmesserly commented 8 years ago

from a quick check of the Bazel build rules, it looks like the only thing ever passed to --build-root (aka library-root) is / so we could probably rip it out completely with above proposal. For the things that end up as file:/// URIs we can simply encode the full absolute path, it is already relative to the right place. So that's a nice validation of the approach.

for module-root it looks like it may be passing in the root directory there too. It also looks very easy to change the -s (...which should probably be -m for --module) if we need to. Really encouraging sign.

CC @vsmenon @jakemac53 as FYI

vsmenon commented 8 years ago

This means the module schemes for Node/ES6 and the custom loader would be fairly different. I can see two possible ways to resolve this:

  1. Support both of my proposals, but disallow the --module-name flag when compiling a Node/ES6 module, and disallow the : syntax for summaries when compiling a custom module.
  2. Drop support for the custom module loader entirely in favor of a more standard solution—probably either RequireJS or a polyfill for the ES6 loader.

Since I'm already proposing a fairly comprehensive overhaul of the module UI, I favor the second option. It also makes users' lives simpler, by using the same module UI everywhere and using a well-understood standard.

Sounds good to me. We'll just need to stage the change over for bazel rules, with all use (2) right now.

jmesserly commented 8 years ago

https://github.com/dart-lang/dev_compiler/issues/626 for the requirejs compatible modules.

I think by adopting that we'll be in really good position for ES6 modules, as all 3 of our module formats (es6, commonjs, requirejs) will have essentially equivalent semantics, just slightly different syntax.

justinfagnani commented 8 years ago

Hey DDC team :)

Drive-by comment, that I hope is helpful.

If you're primarily targeting loading in browsers, then you might not want to generate or use module names at all. The loader that the browsers are currently implementing only loads modules via URL, not by name. Import specifiers must start with ./, ../ or /.

This should work well with package: imports as import 'package:foo/foo.dart' as foo; simply becomes import * as foo from '../foo/foo.js'; (with the appropriate number of ../s to reach the package root).

If you emit imports that use names in import specifiers, users will have to run another tool to convert the names to URLs, or bundle, or convert to another module format like AMD and use a user-land loader.

jmesserly commented 8 years ago

Import specifiers must start with ./, ../ or /.

where's this specified? what about absolute paths? "/base/foo/bar"

If you emit imports that use names in import specifiers, users will have to run another tool to convert the names to URLs, or bundle, or convert to another module format like AMD and use a user-land loader.

... couldn't that be configured in the module loader?

justinfagnani commented 8 years ago

where's this specified?

It's talked about here: https://blog.whatwg.org/js-modules The spec is just the HTML spec.

what about absolute paths? "/base/foo/bar"

Those work, but that assumes a lot about your server. In polymer we have an identical situation with HTML Imports and using ../package-name/ everywhere has worked out quite well.

... couldn't that be configured in the module loader?

The loader is not configurable. It just loads URLs.

jmesserly commented 8 years ago

Sad to see they handicapped the feature so much :(. Back to the drawing board I guess.

nex3 commented 8 years ago

If you're primarily targeting loading in browsers, then you might not want to generate or use module names at all. The loader that the browsers are currently implementing only loads modules via URL, not by name. Import specifiers must start with ./, ../ or /.

This is pretty much exactly what my ES6 module proposal is for :smiley:.

jmesserly commented 8 years ago

yeah but it means the import from "lib" option won't work... we can't even import "dart_sdk" without relative path shenanigans. sigh.

certainly the option where users pass all those in explicitly will work (and we can default to the relative path to summary file, I guess) but they'll need to not move any files.

it's kind of a bummer because nodejs and requirejs offer a lot more options for users to configure things how they want.

nex3 commented 8 years ago

Based on this comment, I was expecting that the default would always be to generate relative paths anyway. For dart_sdk we'd probably want to default to ./dart_sdk.

vsmenon commented 8 years ago

This issue was moved to dart-lang/sdk#27262