Importing a pure function includes unrelated code

marvinhagemeister commented 2 years ago

Noticed that the final bundle contained some unexpected code when dealing with nested bundles. If one imports a pure self-contained function with no dependencies to anything from the outer scope, I expected esbuild to only include that function. But whenever there is an unrelated line in the module which is detected as being a side-effect, esbuild will bundle the whole module instead of just our pure function.

// A.js
export function notIncluded() {
  return "I should not be included";
}

// This line leads to "notIncluded" being bundled
notIncluded._foo = 2;

export function used() {
  return "I'm included";
}

// B.js
import { used } from "./A";
export const value = used();

// C.js
export { value } from "./B";

Command:

esbuild --bundle C.js --outdir=dist

Actual Output:

The notIncluded function is included, despite not being referenced anywhere.

function notIncluded() {
  return "I should not be included";
}
notIncluded._foo = 2;
function used() {
  return "I'm included";
}

var value = used();
export {
  value
};

Expected Output

The notIncluded function should not be part of the bundle.

function used() {
  return "I'm included";
}

var value = used();
export {
  value
};

hyrious commented 2 years ago

Yes, getter and setter are treated not PURE in esbuild since they can have side effects. You can instead write your function like this:

var notIncluded = /* @__PURE__ */ (() => {
  function notIncluded() {
    return "I should not be included";
  }
  notIncluded._foo = 2;
  return notIncluded
})()

marvinhagemeister commented 2 years ago

Thanks for sharing the workaround! Ideally I'd prefer not to tailor source code to a specific bundler. It's an interesting problem seeing other bundlers not agreeing on the same behavior.

Bundle	drops `notIncluded`
esbuild	❌
parcel	❌
rollup	✔️
webpack	✔️

So it looks like they all have a different notion of when something has a potential side effect or not.

hyrious commented 2 years ago

Yes, but in fact esbuild is adopting a very simple rule on tree shaking, every code that can be shaked in esbuild can also be shaked in other bundlers (note that parcel disables tree shaking on library mode, you can test that in building a web page).

Now here is the rule:

Every primitive value (e.g. 42, "hello", [{}], function(){}) is pure (can be removed if no other code use it).
Every function call (including new X()) marked with /* @__PURE__ */ produces a pure value.

marvinhagemeister commented 2 years ago

I understand the way esbuild works and when it marks something as effectful. My point with this issue is that the current logic should be extended to handle the case described here.

hyrious commented 2 years ago

Yeah maybe it's worth for smaller package size. However that will introduce many code to analyse whether some variable is pure, which will slow down the bundling speed. Consider this code:

function f() { console.log('f') }
Object.defineProperty(f, 'x', {
    set(val) { console.log('side effect') }
})
f.x = 1 // should not be removed because of side effect
f.y = 2 // should be removed

Currently none of the bundlers could do what I commented in the code. There's also no such logic to replicate rollup's work in esbuild.

On the other hand, if esbuild has some special rule on tree-shaking, rollup has another, webpack(terser) has another2, what should we do? Maybe we still have to find an intersection of those rules and obviousely that will be close to the two rules mentioned above.

marvinhagemeister commented 2 years ago

I think you're omitting a critical piece in your explanation. In my scenario there is no such thing as a hidden setter via Object.defineProperty or something similar. The notion that an assignment equals a side effect needs to be made more granular. Let's maybe look at another example:

function a() {}
a.x = 2

export function b() {}

If the user only imports b in his code, I'd expect function a to be dropped as that is not referenced anywhere in the users code. Since this is the whole code and there is no call to Object.defineProperty or other shenanigans anywhere it's not a side effect in that it must be preserved.

I fully agree with you that In more complex scenarios like yours where there are explicit calls to Object.defineProperty esbuild should include the function. Main reason being that proving that a piece of code has no side effect is not worth the complexity caused by Object.defineProperty.

But again, we're not dealing with the complex case here in the original issue description. The scenario I've described is a lot simple for proving that there is no side effect.

evanw commented 2 years ago

That case could still involve Object.defineProperty. Hypothetically some other code on the page could do this:

Object.defineProperty(Function.prototype, 'x', {
  set: function() { console.log('side effect') }
})

This was the original reasoning behind not even attempting to support this. The general approach esbuild takes is to only remove code when it can prove that something is side-effect free. It doesn't only keep code when it can prove that it's not side-effect free, since that would require esbuild to have perfect analysis (which is impossible). This is the safest thing to do.

That said, I agree that many people would prefer that esbuild ignore the possibility of getter and setter side effects on built-ins (bare objects, functions, function prototypes, etc.). This is especially true because other tools in the ecosystem also do transformations that are unsafe in the presence of built-in modifications. Along those lines it would be nice for esbuild to support this, but supporting this is going to take more work. It's not a straightforward addition.

I can think of two possible implementation strategies:

Tree shaking in esbuild currently works by a) making each top-level statement a node in a graph, b) pulling in all nodes with side effects, and c) having those nodes pull in all non-side-effect nodes that they reference (recursively). One way to do this within esbuild's tree shaking approach is to potentially make a third kind of node for "side effects but only if a certain node is included." The top-level statement a.x = 2 clearly has side effects because removing it causes an observable behavior change, but that's only observable if a is referenced (under the assumption of no modification to built-ins). This is kind of weird because typically the edges in the graph go the other way (a.x = 2 normally pulls in a, but this requires a to pull in a.x = 2).
I could potentially make the output of var notIncluded = /* @__PURE__ */ (() => { ... })() more compact. Specifically the closure could be inlined but the resulting top-level statements could still be treated as a single tree-shaking unit and esbuild could still pretend they don't have side effects.

The second approach is more general-purpose and would work even in the presence of non-analyzable expressions. For example, something like a.x = foo(a) would prevent the first approach from tree-shaking a but not the second approach because foo might be something like this:

function foo(x) {
  Object.defineProperty(x, 'x', {
    set: function() { throw 'side effect' }
  })
}

The second approach would still treat foo as side-effect free since the author has explicitly marked it as such. I suppose you could get the first approach to work in this case using a.x = /* @__PURE__ */ foo(a) but that only works for function calls, not other expressions that could also capture a and run code on it.

There's also an argument for doing both of these approaches. But this is even more code to design, build, test, and maintain, so there's also an argument for not doing that.

evanw commented 2 years ago

I guess another potential problem is that function declarations are hoisted and can be modified by other modules before the code around the declaration is reached. So hypothetically some other file could import it and call Object.defineProperty on it to add a setter before the property assignments are run. So maybe this shouldn't be done for function declarations.

Edit: Wait never mind. Then the function would be considered used, so that should work.

marvinhagemeister commented 2 years ago

You bring up some good points. Admittedly I haven't fully grasped in depth under which rules rollup and others determine this to be side effect free or at least safe to drop. So far it seems like there are assumptions at play which could be defeated. But maybe that just works out in fine in practice given the way JS code is written in the real world. Pinging @lukastaegert from Rollup. Maybe he can share some insight.

Just stumbled on another good use case which illustrates the use case a little better. Imagine you have a file with a React component and an unrelated export:

// React component which renders "Hello Unknown"
// if "props.name" is not set
export function MyComponent(props) {
  return <h1>Hello {props.name}</h1>
}

// Used to define default values that are merged into
// the component's `props` argument upon invocation
MyComponent.defaultProps = {
  name: "Unknown"
};

// You can also add custom component names for the devtools browser extension
MyComponent.displayName = "MyCoolComponent";

// Unrelated export, user only imports this
export const A = 42

Here the user only imports the const A and not the MyComponent function. So that code should be dropped.

lukastaegert commented 2 years ago

Rollup determines a line to be a side effect if it

Could throw an error, like adding a property to something that could be null
Mutates a variable that is used
Calls a function we do not know to be side effect free

However, there are many subtleties hidden behind those rules.

TheColaber commented 10 months ago

Was there any workaround found for this?

evanw commented 10 months ago

Was there any workaround found for this?

See https://github.com/evanw/esbuild/issues/2010#issuecomment-1035765782 above.

aleclarson commented 3 months ago

Perhaps there could be baby steps made to improve this situation. For example, there could be an option called pureGlobalPropertyAccess that assumes all property access of a global identifeir to be pure. Other options could be added too, like pureTopLevelAssignment.

evanw / esbuild

Importing a pure function includes unrelated code #2010

Actual Output:

Expected Output