bvaughn / react-virtualized

React components for efficiently rendering large lists and tabular data
http://bvaughn.github.io/react-virtualized/
MIT License
26.39k stars 3.06k forks source link

Tree-shaking improvements #889

Closed Andarist closed 3 weeks ago

Andarist commented 6 years ago

The issue came up on Twitter - https://twitter.com/brian_d_vaughn/status/936721354283294720 and it's also discussed in the main README. I believe we can provide a better developer experience out of the box at slight costs.

What's going on with a library such as react-virtualized? Tree-shaking algorithms are extremely fragile (especially webpack's - rollup's is way superior at the moment) and they bail out easily from the optimizations.

Why? It is hard to statically analyze a language, even harder analyzing dynamic language and those algorithms don't want to break anyone's code unexpectedly, so they behave in overly safe manner (which overally is probably a good thing). In other words - they are afraid of possible side effects. In many cases they probably don't even whitelist known globals as side effect-free.

What can even be tree-shaken? Mainly top level statements. Many statements consist of expressions though, which generally speaking are trickier to eliminate. Let's consider simple:

export default class List extends React.Component {
    render() {
        return null
    }
}

What can be seen here:

Result - tree-shakeable

Now consider the same but when classes are transpiled down to es5 (using loose mode for better readability):

var List = function (_React$Component) {
    babelHelpers._inherits(List, _React$Component);

    function List() {
        babelHelpers._classCallCheck(this, List);

        return babelHelpers._possibleConstructorReturn(this, _React$Component.apply(this, arguments));
    }

    List.prototype.render = function render() {
        return null;
    };

    return List;
}(React.Component);

export default List;

What can be seen here:

Result - not tree-shakeable

Maybe at this point we should go back a little and talk about what tree-shaking is doing in webpack. So as we know by default webpack wraps each module in a closure, it supplies module, __webpack_exports__, __webpack_require__ arguments so the module can register itself & require other modules from the global webpack's module registry. When it detects that an export is a candidate for tree-shaking it simply removes the export statement and adds a comment like /* unused harmony default export */ (that is probably only for debugging etc), but it doesn't actually remove the "tree-shaken" code. It shifts that responsibility on minifiers like UglifyJS which do dead code elimination later. In my opinion this is a huge mistake and part of the reason why rollup performs better when it comes to tree-shaking - it actually removes the "dead code", or maybe it's better said that it doesn't include it, it only includes live code.

But going back to webpack - the export from second snippet gets marked as unused by webpack too, so why it doesn't work? Because UglifyJS doesn't know that it is safe to remove this call, it just has not sufficient information to determine statically that this IIFE does not produce any side-effects and that it is safe to remove it.

What can be done about it? Giving a small hint to UglifyJS about nature of this IIFE by adding #__PURE__ comment before the call. If we add it dead code elimination will be able to use it and remove it for us if the webpack has marked the export as unused before (thus it has skipped assigning it to __webpack_exports__).

:tada: babel@7-beta already adds those #__PURE__ annotations, although we might need to tweak it in babel, because not every class declaration is pure (computed properties etc can make them impure).

So the answer seems to be obvious - just add #__PURE__ and be done with it, right?

Unfortunately it is not that simple. Consider classes with static properties (and we need them for React components, no matter if we use a proposal class properties transform or not):

export default class List extends React.Component {
    static propTypes = {}

    render() {
        return null
    }
}

which transforms to:

var List = function (_React$Component) {
    babelHelpers._inherits(List, _React$Component);

    function List() {
        babelHelpers._classCallCheck(this, List);

        return babelHelpers._possibleConstructorReturn(this, _React$Component.apply(this, arguments));
    }

    List.prototype.render = function render() {
        return null;
    };

    return List;
}(React.Component);

List.propTypes = {};

export default List;

What can be seen here:

Result - not tree-shakeable even with #__PURE__

Why? Again - UglifyJS' algorithms are not smart enough to know that a static assignment - List.propTypes = {}; - can be removed and while it is there whole List' IIFE cannot be removed because List is used (by that static assignment).

Generally speaking it's easier for minifiers to remove stuff (i.e. a class) if they are represented by a single statement, therefore this could be removed:

var List = /* #__PURE__ */ function (_React$Component) {
    babelHelpers._inherits(List, _React$Component);

    function List() {
        babelHelpers._classCallCheck(this, List);

        return babelHelpers._possibleConstructorReturn(this, _React$Component.apply(this, arguments));
    }

    List.prototype.render = function render() {
        return null;
    };

    List.propTypes = {};

    return List;
}(React.Component);

export default List;

:tada: but how to achieve such output? There are 3 options here:

:tada: those things mentioned would help...

...but unfortunately wouldn't solve webpack's problem entirely. As mentioned webpack is not even removing dead code and if you have a module like this:

/* 3 */
/***/ (function(module, __webpack_exports__, __webpack_require__) {

"use strict";

/* harmony import */ var _Grid__WEBPACK_IMPORTED_MODULE_0__ = __webpack_require__(0);

var List = /* #__PURE__ */ function (_React$Component) {
  babelHelpers._inherits(List, _React$Component);

  function List() {
    babelHelpers._classCallCheck(this, List);

    return babelHelpers._possibleConstructorReturn(this, _React$Component.apply(this, arguments));
  }

  List.prototype.render = function render() {
    return React.createElement(Object(_Grid__WEBPACK_IMPORTED_MODULE_0__["a" /* default */]), this.props);
  };

  List.propTypes = {};

  return List;
}(React.Component);

/* unused harmony default export */ var _unused_webpack_default_export = (List);
})

List will be removed correctly as unused, but List's dependencies (Grid here) won't. More can be read here (with standalone repo as example).

So what can be done about this?

NOTE There are also some other issues in react-virtualized which prevents proper tree-shaking, but those should be easier to fix.

bvaughn commented 6 years ago

Excellent write-up, Mateusz. Thank you for taking the time to add all of this info.

Andarist commented 6 years ago

The question is - would you like to incorporate things mentioned in the writeup to the repo? 😃

bvaughn commented 6 years ago

Are you asking if I would accept a PR or if I would implement the changes?

Andarist commented 6 years ago

Oh, I meant that I would like to create a PR for this of course :) Sorry for not being clearer.

We'd only have to decide what needs to be done from the all things mentioned - I guess flat bundling is the most controversial as it would be a breaking change.

bvaughn commented 6 years ago

No worries at all. 😄

I'd be happy to review a PR.

I think a good goal of this might be to use a flat, Rollup bundle for the default (CJS) distributable, and then continue exporting ES6 modules as well for people who want to reference them directly.

If we wanted to approach this much more aggressively, we could consider converting the repo to use Yarn workspaces too- and then distribute several NPM packages, eg:

We could also publish a react-virtualized bundle that just re-exports everything, for an easier intermediate migration step.

What do you think about the above? I wouldn't mind helping with this but I'm reluctant to commit to doing it myself, particularly around the holidays. (I have family visiting from Dec 22nd-29th and will be totally off the grid then.)

Andarist commented 6 years ago

I think a good goal of this might be to use a flat, Rollup bundle for the default (CJS) distributable, and then continue exporting ES6 modules as well for people who want to reference them directly.

Personally I would argue that differentiating structure of different entry points in this case would be a bad thing (unless it gives some real benefits). The only difference imho should be in the module format itself and not in the directory structure and ways how 1 can be consumed but other cannot.

If we wanted to approach this much more aggressively, we could consider converting the repo to use Yarn workspaces too- and then distribute several NPM packages

That's an option but I'm unsure what benefits it would give to the users, assuming we can make everything tree-shakeable here within a single package.

My first concern about this was that it would be harder to minify common code as it would have to become a package with defined API (thus identifiers need to stay the same, unmangled) etc but I guess bundlers doing scope hoisting would allow mangling those.

Other real downside that comes to my mind is versioning between packages - ofc we would make versions to line up with each other, everything being versioned the same, but that won't stop people from installing different versions of each subpackage (unconsciously or by accident) and that can lead to troubles.

And finally splitting it would make migrations somewhat harder (even considering docs and articles about react-virtualized out there in the wild) and give worse experience for webpack2 and webpack3 (when not used with ModuleConcatenationPlugin) users when using reexporting package of react-virtualized.

Let me know what do you think, I would love to start working on this some time this week. I don't mind doing this either way, just want to discuss tradeoffs of all proposed approaches first.

What do you think about the above? I wouldn't mind helping with this but I'm reluctant to commit to doing it myself, particularly around the holidays. (I have family visiting from Dec 22nd-29th and will be totally off the grid then.)

No worries, I got you covered ;)

bvaughn commented 6 years ago

I think a good goal of this might be to use a flat, Rollup bundle for the default (CJS) distributable, and then continue exporting ES6 modules as well for people who want to reference them directly.

Personally I would argue that differentiating structure of different entry points in this case would be a bad thing (unless it gives some real benefits).

The benefits would be continuing to support CommonJS (for anyone not yet using ES6 modules) without preventing tree-shaking for people using ES6. Although this wouldn't be necessary if we also did the scoped-packages thing I mentioned though.

If we wanted to approach this much more aggressively, we could consider converting the repo to use Yarn workspaces too- and then distribute several NPM packages

That's an option but I'm unsure what benefits it would give to the users, assuming we can make everything tree-shakeable here within a single package.

Only if you're proposing dropping CJS from the NPM distributable, which I don't think I'm willing to do. Otherwise users will get a flat module that isn't otherwise shakeable (so if they use only List and Grid they'll also get Tree and Masonry etc.)

My first concern about this was that it would be harder to minify common code as it would have to become a package with defined API (thus identifiers need to stay the same, unmangled) etc but I guess bundlers doing scope hoisting would allow mangling those.

So long as the dependency isn't built into the distributable (but left as an import/require statement) it should be fine.

Other real downside that comes to my mind is versioning between packages - ofc we would make versions to line up with each other, everything being versioned the same, but that won't stop people from installing different versions of each subpackage (unconsciously or by accident) and that can lead to troubles.

I'm not too concerned about this one to be honest. React does this (eg react, react-dom, react-test-renderer) and I think it works out okay.

And finally splitting it would make migrations somewhat harder (even considering docs and articles about react-virtualized out there in the wild) and give worse experience for webpack2 and webpack3 (when not used with ModuleConcatenationPlugin) users when using reexporting package of react-virtualized.

I don't understand the "worse experience for webpack 2 and 3" comment. Could you elaborate?

As for the migration experience, this is why I mentioned the option of continuing to publish a single react-virtualized module that glued the others together. I think this is kind of what Lodash does (eg lodash vs lodash.memoize).

billyjanitsch commented 6 years ago

As for the migration experience, this is why I mentioned the option of continuing to publish a single react-virtualized module that glued the others together. I think this is kind of what Lodash does (eg lodash vs lodash.memoize).

FWIW, that is not what Lodash does -- Lodash is a peculiar case. The lodash package has no dependencies. It ships a bunch of CJS modules (e.g. lodash/memoize) as well as a single entry (lodash) which is a flattened bundle, compiled at build time. Lodash v5 will not ship a flattened bundle, and will only ship the modules as ESM, using @std/esm for node/CJS support.

The per-method packages (e.g. lodash.memoize) also have no dependencies (the dependencies are flattened along with the source into a single file at build time). Also, they are being discontinued in Lodash v5.

Andarist commented 6 years ago

The benefits would be continuing to support CommonJS (for anyone not yet using ES6 modules) without preventing tree-shaking for people using ES6. Although this wouldn't be necessary if we also did the scoped-packages thing I mentioned though.

I was not proposing ditching cjs format, rather creating flat bundles for both cjs and es formats.

Only if you're proposing dropping CJS from the NPM distributable, which I don't think I'm willing to do. Otherwise users will get a flat module that isn't otherwise shakeable (so if they use only List and Grid they'll also get Tree and Masonry etc.)

Hm, Im not sure if we are on the same page here - I'm saying that flat bundles helps tree-shakability in webpack and im pretty confident that we can make a flat bundle for es6 modules format that would be tree-shakeable out of the box.

I don't understand the "worse experience for webpack 2 and 3" comment. Could you elaborate?

Webpack doesn't handle well tree shaking imports of your imports. Assuming react-virtualized package which would reexport Grid from @react-virtualized/grid which in turn would import some (but not all) @react-virtualized/helpers it might not remove unused helpers.

It's a little bit hard to explain but I have a simple demo for this. Please take a look at the result bundle here where only the add function is actually used, but the mult is also in the bundle because it was a dependency of unused pow. Not mentioning all of those extra webpack module wrappers which some are even empty, like this one - contents of such modules were removed as unused by the runtime module wrapper was left out, because webpack doesn't do full "tree-shaking", it just removes some assignments and let uglifyJS do the rest and well... uglify is not able to "escape" the module wrapper boundary during its analysis so it can remove only the contents.

As for the migration experience, this is why I mentioned the option of continuing to publish a single react-virtualized module that glued the others together. I think this is kind of what Lodash does (eg lodash vs lodash.memoize).

As mentioned by @billyjanitsch, lodash doesn't reexport those smaller modules and also single util packages containing other lodash utils are flattened so there is no code sharing in any of those cases. Each lodash package stands on its own and if you mix them you end up with duplicated code.

bvaughn commented 6 years ago

Regarding ES bundles- thanks for clarifying. I believe we're on the same page now. While I don't see added value in a flat ES bundle, I also don't have an objection. 😄

Regarding your Webpack 2/3 comment, I don't think that would actually be an issue for RV. Components like List and Tree depend on Grid. In turn, Grid depends on a set of helpers. These dependencies are not conditional though, so there really would be nothing to "shake" from eg the Grid dependency.

Regarding the Lodash comment...it feels like we might be getting a bit pedantic. I wasn't trying to suggest that Lodash did the exact thing I described, just something conceptually similar (one monolithic distributable, and then smaller stand-alone distributables). The main difference I see is that the standalone ones I propose may have dependencies (eg List depends on Grid) in order to avoid bundling 2 copies of Grid code in an application that uses both Table and List. Avoiding that duplication is important I think.

Andarist commented 6 years ago

Regarding ES bundles- thanks for clarifying. I believe we're on the same page now. While I don't see added value in a flat ES bundle, I also don't have an objection.

Let's see how it goes then, I don't have really strong opinion on this either, just don't see (personally) much added value in using monorepo approach here.

I've already started to do some ground work, but to really make this all work flawlessly I need to tweak some things in Babel first.

TrySound commented 6 years ago

@Andarist Scaned thread fast. Maybe missed something.

requiring commonjs modules, such as classnames, babel-runtime/helpers

We can try babel7 for helpers which uses es modules and classnames is one like helper, so not a problem.

using a commonjs require (!) inserted by babel-plugin-flow-react-proptypes

Alpha version uses es modules, hope to integrate soon.

prototype assignments such as this (again - things as single statement are easier to remove) and Object.defineProperty calls

Can this be achieved with your plugin?

Andarist commented 6 years ago

requiring commonjs modules, such as classnames, babel-runtime/helpers We can try babel7 for helpers which uses es modules and classnames is one like helper, so not a problem.

I plan to use babel7 here, because it introduces PURE comments for various things. While I have added a module output type for babel helpers im not sure atm if it gets distributed to npm. Need to check and push it forward if neccessary.

It's also an interesting choice that this library use imported babel helpers, not many packages do that. It for sure should improve overall bundle size, as they can be shared between modules - just wondering if there are any downsides of this approach 🤔

As to classnames - I need to investigate what exactly can commonjs dep in es package do. How much impact does it have on preventing webpack doing its optimizations. Anyway I've already sent a PR there with a slight refactor (converted to es modules), but I guess there is no active maintainer at the moment and it didn't get noticed yet.

using a commonjs require (!) inserted by babel-plugin-flow-react-proptypes

Alpha version uses es modules, hope to integrate soon.

Yeah, I've asked for this after creating this very issue. Glad the maintainer could find time to work on this 👍

prototype assignments such as this (again - things as single statement are easier to remove) and Object.defineProperty calls Can this be achieved with your plugin?

Not rly, its focused on function calls and has not effect on prototype assignments. As to Object.defineProperty calls - it cannot help in this case either, because its output is not assigned to any variable, annotating it with PURE would mean that it would get dropped every time, no matter what. As Object.defineProperty in fact returns a reference to an object we could wrap the whole object in it, annotate the result which would be assigned to a variable.

But after further investigation I've realised that this is some other module forked & inlined in the source code (probably because the other package was not mainted and this package just needed the fix). I've already refactored this part locally to just use es6 classes syntax and this will be tree-shakeable when we upgrade to babel7.

TrySound commented 6 years ago

So all is good in the kitchen with good maintainment? Sorry I'm drunk a bit.:(

bvaughn commented 6 years ago

😂

alexkrolick commented 5 years ago

There's a module entry in package.json - would adding sideEffects: false have an effect now given Babel 7 and Webpack 4 are out (assuming there are actually no side-effects)?

TrySound commented 5 years ago

@alexkrolick We decided to remove sideEffects field some time ago to simplify distribution. style.css was an effect we didn't considered. There can be another effects which should not be removed unsafely. I'd say sideEffects leads to uncontrollable bugs.

This packages is still used with only path imports (react-virtualized/dist/es/List).

alexkrolick commented 5 years ago

I don't see that style.css is actually imported by the JS code within the package, so it doesn't cause a side-effect, but if it did, you can list exceptions like this:

"sideEffects": ["*.css"]

But if there are other effects it's reasonable to not add it.

TrySound commented 5 years ago

It's usually imported by user.

That's my point. After this bug I realised how unsafe sideEffects is. And I don't know the code of this project enough to add this field safely.