sokra commented 4 years ago

@developit requested a review. Here you are:

General

Best practices?

In the mail you claimed:

The tests show if and how a tool allows web developers to follow certain best practices in web development.

While I think this is true for some test cases, I also think that some test cases test very specific features or edge cases. This make it difficult for developer to see what's the best practice is here. I would recommend to split (or mark) the test cases into best practices and features.

Plugins and Configuration

In my opinion the test results are very difficult to compare because they don't show how much plugin code + configuration is needed to pass a test. In rollup tests I sometimes find very longy plugins, while parcel can pass some test without can configuration. I mean any bundler can probably pass any test when you throw enough plugin/infrastructure code at it (or magic regexp replaces to work around the problem). I would recommend to set a plugin + configuration LOC limit and/or display plugin + configuration LOC in the test results.

Custom plugins

You have written quite a lot custom plugins for rollup. While in some cases it makes sense to have them in your codebase (like non-js-resources/custom), in most cases developers won't do it this way. In most cases such plugins should be published as general solutions as reusable open source plugins. At least that is what I would consider as "best practice".

This especially relevant as some of the plugins in the test suite are kind of mocks, which only implement enough to treat the test as passed (if even?), but return fake urls, doesn't work for watch mode, have no caching, etc. So in practice they are not usable if somebody tries to follow the configuration and plugin code in the test. I would not consider these tests as passing in the current state, because the solution is not usable in a real codebase.

I assume it still possible to implement the solution in a usable, general-purpose fashion, but the current state doesn't prove that in my opinion.

In addition to that having a lot of custom plugin may make rollup appear a lot more complicated that it is. The average developer might be afraid of custom plugin code needed to cover these features.

tests

regarding webpack

Code Splitting

Some comments and configurations about webpack claim a wrong assumption: It says that when code is duplicated between chunks it is instantiated multiple times. This is not true. Even when module code is duplicated, it's always only instantiated once, when it's part of the same webpack runtime (optimization.runtimeChunk: "single"). In my opinion duplicating code can sometimes even perform better than avoiding duplicating at all. It's actually a trade-off between transferring more code (less caching) vs. issuing more requests (more files). More requests/files usually comes with less-effective compression and additional cost for http overhead (headers, etc.). For HTTP 1.1 it can even cause additional delay due to round-trips because of connection limits. Even if server and browser support HTTP 2, it may have to fallback to HTTP 1.1 because of some old middleboxes/proxies, but that's rare. That's why webpack uses a heuristic to determine when code is duplicated (<30kb or reached 6 requests per import() limit) and when an additional file is created.

These tests are affected:

multi-entry: comment
shared-objects-between-entry-points: comment, webpack should pass this one with only runtimeChunk: 'single' in the configuration
and search for minSize

You should remove splitChunks.minSize: 0 from all test cases if you want to provide the best practice.

dynamic-import

webpack has the guideline that: An import() should only need a single round-trip to be loaded. Multiple requests are allowed but must happen in parallel.

I would consider this as best practice and would recommend to add an additional test to cover that.

splitting-modules

While your examples indeed doesn't work, there is a scenario that works in webpack and I believe in rollup too. (don't about parcel, you can try it)

It's used in practice by libraries like lodash-es.

imported module must contain reexports to individual modules
module with reexports must be side-effect-free (e. g. package.json must have "sideEffects": false)
two different exports are used which point to different modules

Example:

// objects.js (must be side-effect-free)
export { foo } from "./foo";
export { bar } from "./bar";

// foo.js
export const foo = { name: 'foo' };

// bar.js
export const bar = { name: 'bar' };

Hashing

avoid-cascade

It's a trade-off. Hoisting all these asset hashes/url into the importMap/manifest can affect initial page load negatively. webpack has gone the way that everything you put behind an import() should be loaded on-demand. It still avoids cascade by limited the affected files to O(1). Yes it's 2 files affected instead of 1 file, but that's a cost I'm willing to pay for not needing to download all asset hashes/urls in the initial page load.

avoid-hash

webpack 5 will support that, but I really don't think you need that in practice. Why would you want to hash only some of your import() generates javascript files?

Importing modules

AMD is also found on some old npm modules or legacy codebases.

CommonJs <-> ESM interop is also probably worth testing. Some edge-cases are handled differently between bundlers, but there is no clear spec for this and so nobody is wrong.

rollup handles CommonJS different than parcel and webpack, by converting it to ESM. This has some limitations which makes sense to test here. (Conditional require(), circular dependencies with require() nested, strict mode)

There is some difference in how bundlers handle non-static import(), e. g. import(`./images/${name}.png`) This is also worth covering.

Non-JS-Resources

CSS

I find a lot best practices not covered by these tests here.

In my opinion the best practice is the put the CSS into a separate .css file. But not a single one, but a .css file per import() (or multiple for deduplication, similar to JS).

This CSS file should be loaded in parallel to the JS part. Importing styles to get a url to the .css file isn't the best way in my opinion.

For styles in the entrypoint they need to be referenced as <link rel="stylesheet"> in the HTML.

For styles behind import() a <link rel="stylesheet"> should be added dynamically. The stylesheet must be fully loaded when JS is executed otherwise a FOUC occurs.

For webpack the usual guideline here is that one import() should only need a single round-trip to be loaded (It can still have multiple requests when they happen in parallel). This also applies to loading stylesheets.

For hashing: Best if changes in the CSS doesn't affect the hash of the JS part.

Custom

How "easy" is it to integrate watching and caching for these custom types?

Differences in general principles

It would be nice to highlight differences in higher principles.

Example:

In parcel and webpack chunks are container for module factories. This means chunks can be loaded without side effect of executing the modules in the chunk.

In rollup chunks are module. This means loading a chunk is equal to evaluating the modules it represents (depending on the output.format, assuming ESM here).

This can be used to explain some of the effects.

More ideas

build in watch mode
memory caching, persistent caching
file watching
dev server infrastructure
HMR
typescript

static-build

I would really like to see the final result, but sadly the build doesn't run on windows and I wasn't able to fix it easily. That would be something you can avoid in long term when publishing your custom plugins for general usage as open source instead...!

Disclaimer

All this is my personal opinion. If it sounds a bit bad to rollup, this was unintended. It's a great tool and my criticism is not against rollup itself, but the way the tests are implemented. To me it seems like the authors favor rollup in general, which is not bad at all, but to be an objective test suite some adjustments need to be made.

TODO list

At least what I would do...

[ ] separate between features and best practices
[ ] LOC config + plugins
[ ] custom plugins -> publish
[ ] code spliting: fix duplication assumption
[ ] remove splitChunks.minSize: 0
[ ] hasing cascade
new test cases
- [ ] import() should load all JS in parallel
- [ ] splitting modules with sideEffects: false
- [ ] CommonJS <-> ESM interop
- [ ] dynamic import()s
- [ ] condition require()
- [ ] circular dependencies in CommonJS by nesting require()
- [ ] non-strict mode CommonJS
- [ ] CSS: load external stylesheet in entrypoints
- [ ] CSS: avoid FOUC by having the stylesheet in the HTML
- [ ] CSS: load external stylesheet in on-demand modules
- [ ] CSS: avoid FOUC by ensuring that JS executes after CSS has loaded
- [ ] import() should load all JS + CSS in parallel
- [ ] CSS hash: changing CSS should not affect JS
- [ ] Custom assets are watched during watch mode
- [ ] Custom assets are cached during watch mode
[ ] fix static-build for windows

surma commented 4 years ago

Hey @sokra,

I very much appreciate you taking the time to write up this much feedback for us. We’ll talk through this within the team in the next week and see what we can incorporate.

We’ll update you here :)

developit commented 4 years ago

Awesome writeup @sokra - can't thank you enough for how thorough this is!

Some quick notes from me:

A bunch of your other suggestions are really good things that we missed or glossed over. In particular the commonjs/esm interop stuff - that certainly can bite developers and cost them time, and I think that would make for some really good tests from a documentation standpoint.

I think the issue with multiple instantiation was mostly a wording one - I had meant to indicate that this was an issue only when runtimeChunk:'single' was not in use, which is perhaps worth noting given that it's not the default value. However the test pass/fail/partial status should really reflect the correctly-configured result as you mentioned.

Regarding the splitChunks.minSize, you're right to point that out - that's one case where we had to choose between having a concise test or demonstrating a best practise. For context - I went with the concise test, because the alternative would be to pad out the module with a huge string (we did this for Parcel because its size heuristic isn't configurable). I actually think the ability to configure that size threshold is a selling point, and we might be able to address the best practise issue by adding a comment in the config stating that this was done strictly for testing purposes. Then I can remove the note about minSize from the test description entirely, since most developers should just use the default value.

About some of the CSS/hashing tests - I think our descriptions here could use some work. The thing that these were testing is actually not super applicable to Webpack because it uses a runtime, and we don't do enough to explain "some bundlers actually don't and shouldn't cascade" in our test descriptions. Right now, the tests try to ensure that any bundles with references to an asset are invalidated/updated if the asset's URL changes. That means file-loader makes sense to test since it embeds asset URLs, but not chunks or extracted assets.

At a high level, it's possible we need some sort of test for "loads CSS in the most effective way" - I don't think any of our tests currently capture that, they're just testing behaviors. One of the reasons this is difficult is because this is an area where the tools differ pretty substantially. Webpack and Parcel tend to aim for holistic solutions ("load the CSS with your JS"), whereas Rollup and Browserify tend to aim for lower-level solutions ("reference CSS from JS"). Your point about there being a lot more custom plugins here for Rollup than for Webpack seems like it's at least partly a result of this - we're trying to paper over the differences between "batteries included" solutions and "Bring Your Own" solutions, which is a bit awkward. Ultimately the goal is to establish a representative comparison despite these differences, but it's possible we're missing some sort of metric (like LOC) that would balance out the amount of customization required to achieve a "pass" result.

Watching / HMR / dev server stuff gets ... tricky. I'd mentioned in my email that we've tried to avoid testing anything of that nature so far, because dev/watch features generally don't affect production output. This is bais as you noted, but it comes from unbiased reasoning: if we can't measure something objectively (eg: production output), it's harder to assess objectively. We really don't want to misrepresent tools when comparing them, so I think in order to explore testing dev-time or watch related features, we need to find a way to measure them. Perhaps some combination of (repeat) build performance, JS size for updates, update granularity, etc. We just haven't yet been able to put together a strong enough set of metrics that would make this objective.

jakearchibald commented 4 years ago

@sokra

In my opinion the test results are very difficult to compare because they don't show how much plugin code + configuration is needed to pass a test.

The test results will link to the example for each build tool, so developer will be able to compare and contrast.

In rollup tests I sometimes find very longy plugins, while parcel can pass some test without can configuration. I mean any bundler can probably pass any test when you throw enough plugin/infrastructure code at it (or magic regexp replaces to work around the problem). I would recommend to set a plugin + configuration LOC limit and/or display plugin + configuration LOC in the test results.

In Webpack, even something as simple as https://github.com/webpack/webpack/blob/master/lib/DefinePlugin.js?source=cc is pretty long, but I don't think its usage should be disallowed.

Also, I don't think LOC is a good measure of complexity.

You have written quite a lot custom plugins for rollup. While in some cases it makes sense to have them in your codebase (like non-js-resources/custom), in most cases developers won't do it this way. In most cases such plugins should be published as general solutions as reusable open source plugins. At least that is what I would consider as "best practice".

This especially relevant as some of the plugins in the test suite are kind of mocks, which only implement enough to treat the test as passed (if even?),

If there are tests marked as passing but aren't actually passing, please let us know.

but return fake urls,

I don't think this is the case. In some plugins 'fake' URLs act as placeholders, but these are updated to be actual URLs once files are hashed.

doesn't work for watch mode

That's fair. I've filed https://github.com/GoogleChromeLabs/tooling.report/issues/93 for this. It's a one-line change in most cases.

So in practice they are not usable if somebody tries to follow the configuration and plugin code in the test. I would not consider these tests as passing in the current state, because the solution is not usable in a real codebase.

Aside from the watch issue, they seem usable to me, although I might be missing something. Can you refer to specific tests that are marked as passing but shouldn't be?

In addition to that having a lot of custom plugin may make rollup appear a lot more complicated that it is. The average developer might be afraid of custom plugin code needed to cover these features.

We plan to summarise tests, and if developers concluded:

Webpack has more, higher-quality, up-to-date plugins. Getting started is easier.
It's easier to write plugins for Rollup, so if you need to do something that existing plugins don't cater for, Rollup is easier.

…that would certainly fit with my personal experience. But a lot of developers want to stick to existing plugins, and don't mind if that prevents them doing certain things. That's fine. Hopefully the data presented will help them make the choice for them.

This is not true. Even when module code is duplicated, it's always only instantiated once, when it's part of the same webpack runtime (optimization.runtimeChunk: "single").

Good to know! I think we need to decide if this is a 'pass' or a 'partial pass'. Duplicating modules can introduce bugs at runtime, so it's a bit of a footgun, but it's good that there's a way to opt-out of it. It feels like should be some combination of opt-in and auto-detected, like the 'pure module' stuff.

webpack has the guideline that: An import() should only need a single round-trip to be loaded. Multiple requests are allowed but must happen in parallel.

I would consider this as best practice and would recommend to add an additional test to cover that.

Yeah, that's a really nice feature. Filed https://github.com/GoogleChromeLabs/tooling.report/issues/94. Does that test make sense?

splitting-modules

While your examples indeed doesn't work, there is a scenario that works in webpack and I believe in rollup too. (don't about parcel, you can try it)

It's used in practice by libraries like lodash-es.

Agreed. Filed https://github.com/GoogleChromeLabs/tooling.report/issues/95. Does that make sense?

webpack 5 will support that, but I really don't think you need that in practice. Why would you want to hash only some of your import() generates javascript files?

It definitely makes sense for entry files, as things like a service worker, or a third party entry point should have a stable URL. But yeah, maybe it doesn't make sense for import(). What do you think @developit @surma @kosamari?

rollup handles CommonJS different than parcel and webpack, by converting it to ESM. This has some limitations which makes sense to test here. (Conditional require(), circular dependencies with require() nested, strict mode)

Conditional require seems particularly interesting.

There is some difference in how bundlers handle non-static import(), e. g. import(`./images/${name}.png`) This is also worth covering.

Interested in more details on this!

CSS

I find a lot best practices not covered by these tests here.

Yeah. Webpack does a really good job with CSS compared to other tools, and we're not doing enough to highlight that. I've filed https://github.com/GoogleChromeLabs/tooling.report/issues/96 with some ideas.

To me it seems like the authors favor rollup in general, which is not bad at all, but to be an objective test suite some adjustments need to be made.

I wrote the Rollup tests, and I do favour Rollup. @developit wrote the Webpack tests and I believe he's much more familiar with Webpack than Rollup.

Rollup has a simple well-documented plugin system, and that's a huge selling point for me. When I've used other build tools and their plugins, I've often become frustrated that I can't do what seems like a simple thing, because the plugin has taken ownership of my code, and it doesn't offer an option for the thing I need.

I realise not all developers are like that, but I'm hoping tooling.report will make it clear, that if you're a developer that has similar frustrations to me, Rollup has advantages.

But yes, we need to make sure we're showing the best of all tools, so developers can make the best choice for them.

GoogleChromeLabs / tooling.report

Review #90

General

Best practices?

Plugins and Configuration

Custom plugins

tests

Code Splitting

dynamic-import

splitting-modules

Hashing

avoid-cascade

avoid-hash

Importing modules

Non-JS-Resources

CSS

Custom

Differences in general principles

More ideas

static-build

Disclaimer

TODO list

splitting-modules

CSS