asfktz / autodll-webpack-plugin

Webpack's DllPlugin without the boilerplate
MIT License
1.46k stars 80 forks source link

Automatically building the dependencies list #54

Open Nargonath opened 7 years ago

Nargonath commented 7 years ago

Before finding your plugin, I tried to implement the DLL plugin myself in my projects. I thought that it would be cumbersome for my team to have to manually update the list of dependencies every time they install a dep. Plus I'm sure every now and then we would simply forget to update it. I then created a script that runs on NPM prestart hook and that would check the dependencies list in the package.json against a cached version and rebuilding the dependencies list if needed. I also added a dllIgnore field in the package.json in order to ignore package that do not need to be in the DLL (bootstrap, non-js stuff etc.).

What do you think of adding such an automation to this plugin (if possible)? It would be toggle with a new setting i.e auto: true which would be false by default. If you are ok with it, I'd be happy to make a PR for it.

Thanks for your great plugin.

asfktz commented 7 years ago

Hi @Nargonath,

Yeah, I agree it will be very useful to have such feature.

I thought about something similar earlier but two reasons kept me from implementing it:

  1. I intend the plugin to be an abstraction over the DLLPlugin, where you don't need to worry about creating a separate config or rebuilding your DLL but without making too many assumptions about the use cases.

  2. I'm still not sure if it's a good practice, see this tweet:

Currently, it is possible to achieve with a few lines of code:

const pkg = require('./package.json');
const { keys } = Object;

const ignore = ignoreList => key => !ignoreList.includes(key);

new AutoDllPlugin({
  entry: {
    deps: keys(pkg.dependencies).filter(
      ignore(['bootstrap', 'font-awesome'])
    )
  }
});

How about adding this functionally as a helper function? Something like that (not sure about the name):

var AutoDllPlugin = require('autodll-webpack-plugin');
var { readDeps } = require('autodll-webpack-plugin/helpers');

new AutoDllPlugin({
  entry: {
    deps: readDeps({
      ignore: [
        'bootstrap',
        'font-awesome'
      ]
    })
  }
});

Btw, the plugin should rebuild your DLL if you update your packages. Currently, there's a bug when using NPM, but it works with Yarn. I intend to fix it soon.

Nargonath commented 7 years ago

Hi @asfktz!

Thanks for your detailed answer.

  1. I definitely understand why you want to keep abstracting the DLL boilerplate as it is quite cumbersome. But I feel not having to care about helpers and just using an opt-in option is still heading in that direction, isn't it? I guess you don't like to have a setting in the package.json, do you? It seems accurate to me having the list of dependencies you want to ignore for the DLL right next to your dependencies list but, well, I guess it is just a matter of perspective.

  2. Thanks for the link. According to it, the only real problem would be that people could mixed devDep and actual dep in their package.json making the DLL a bit pointless. But still, they could do the same by adding unneeded dependencies to your plugin settings.

I'm not a big fan of the helper method, as I stated above, but it works either way. I'd respect your choice no matter what you choose.

We would need to make a hash (JSON.stringify) of the dependencies names along with their semver to be aware of any dependency addition/removal and version update. We wouldn't want to keep an outdated package version in our DLL. I'm not really aware of what you can and what you cannot do with a webpack plugin though.

How do you handle reacting to dependencies list update? In my own implementation I had used prestart NPM hook to check the dependencies list against my cached version every time we run npm start or yarn start. I did a little digging back then and pre|post|install NPM hook work only when installing your package but not when actually using npm install packageName or npm remove packageName. Is it the NPM bug you were referring to?

I'm looking forward to make that PR.

asfktz commented 7 years ago

How do you handle reacting to dependencies list update?

The Way I do it is by storing the previous hash in a the cache directory and compare it with the current hash.

See: https://github.com/asfktz/autodll-webpack-plugin/blob/next/src/validateCache.js

I'm not satisfied with the implantation. There is already a hash for the build, ideally package.json should be part of it that hash.

Here you can see that the hash is made from the settings passed to the plugin and there's env and id too, both for handling multiple instances: https://github.com/asfktz/autodll-webpack-plugin/blob/7e4eccda2d88c4950a582517fabab1637e8bef0b/src/createHash.js

So, in the end, it looks like this: development_instance_0_36cd3b5f6c3acd8ecfbc34b9512733f1

The cache is stored in: node_modules/.cache/autodll-webpack-plugin

I guess you don't like to have a setting in the package.json, do you? It seems accurate to me having the list of dependencies you want to ignore for the DLL right next to your dependencies list but, well, I guess it is just a matter of perspective.

Not at all, I actually prefer it over endless .dotfiles scattered around (:

You'd be surprised to know that it was actually the first use case for the plugin. It started as a PR for create-react-app, where you set your DLL dependencies from package.json itself:

https://github.com/facebookincubator/create-react-app/pull/2710/files#diff-a7f98c18479be87c9f33e7604dbd1a09R37

From the same reason I feel a bit of rejection to it, sometimes big project prefer to do things their way. On the other hand, maybe implementing it a single way can feel more of a standard. I don't know.

Now that package.json change triggers invalidation (next branch) The user can implement it with just one line of code:

new AutoDllPlugin({
  entry: {
    deps:  Object.keys(pkg.dependencies).filter((dep) => !pkg.dllIgnore.contains(dep))
  }
});

After the change, do you think it's still necessary? I open to hear.

And Thanks for willing to help, @Nargonath I appreciate it a lot (:

Nargonath commented 7 years ago

You'd be surprised to know that it was actually the first use case for the plugin. It started as a PR for create-react-app, where you set your DLL dependencies from package.json itself:

Actually that's how I read about your package in the first place. I found the original issue where you offered to use your package in CRA.

Now that package.json change triggers invalidation (next branch)

Yeah you mean that now webpack watches the package.json and whenever it changes, this line https://github.com/asfktz/autodll-webpack-plugin/blob/7e4eccda2d88c4950a582517fabab1637e8bef0b/src/plugin.js#L65 gets called and you just check if the hash is still valid otherwise you rebuild the dll, is that right? If that's right, are you sure that the deps list would get updated? I may be wrong about this, you tell me:

The user gives the list of dependencies as is:

new AutoDllPlugin({
  entry: {
    deps:  Object.keys(pkg.dependencies).filter((dep) => !pkg.dllIgnore.contains(dep))
  }
});

The above code is executed only once when the user starts webpack, isn't it? When it runs again due to webpack-dev-server or --watch mode detecting a file change, the configuration does not get called again thus the list of dependencies remains the same, doesn't it? If what I'm saying is true it means that even though we'd detect a package.json change thanks to the work you did in the next branch, the DLL would still not be updated. And again if I'm right, this would be the reason why we need to read the dependencies list within your plugin.

We should still give users the choice about whether or not they want the plugin to automatically fetch the dependencies list for them. However one problem that I can think of is when having multiple entries. There is no point of automatically setting the dependencies for each entry point because they'd get the same values. We could add a setting that is either a string or an object:

Concerning the ignored dep list, we could support both using dllIgnore inside package.json and supplying the ignore list as part of the object setting variant.

For example this would look like:

// case 1: string variant
// if dllIgnore is set in the package.json the values would be used automatically
new AutoDllPlugin({
  entry: {
    nonAutoEntry1: ['react', 'react-dom'],
    nonAutoEntry2: 'moment',
  },
  auto: 'myAutoEntry',
});

// case 2: object variant
// if the ignore attributes is given, the dllIgnore field in package.json would be ignored
new AutoDllPlugin({
  entry: {
    nonAutoEntry1: ['react', 'react-dom'],
    nonAutoEntry2: 'moment',
  },
  auto: {
    entry: 'myAutoEntry',
    ignore: ['redux'],
});

Both examples have a 3-entry DLL.

What do you think?

Thanks for your time at explaining all of this. It is appreciated as well.

asfktz commented 7 years ago

@Nargonath, you are so right! Such a silly mistake.

Yes, being able to automatically update the DLL on invalidation without restarting the dev server will be awesome!

There are quite a few changes that need to be done in order to enable it:

The main challenge as I see it is that the hash is created too early, It's created by createHash (called by createSettings) when the plugin initialized:

plugin.js:24

class AutoDLLPlugin {
  constructor(settings) {
    this._originalSettings = settings;
  }

  apply(compiler) {
    const settings = createSettings({
       originalSettings: this._originalSettings,
       index: getInstanceIndex(compiler.options.plugins, this),
       parentConfig: compiler.options
    });

    // settings.hash looks like:
    // 'development_instance_0_751bd3d7d750aeacee325e450487ec40'

    ...

Ideally, the hash should be created much later, on every run, and also include package.json in it.

I had to create it so early because I needed to assign the DllReferencePlugin to the main compiler with the path to the DLL's manifest file before the main compiler starts working.

The DLL manifest and the DLL bundle will be created later by the original DllPlugin.

It does not matter for the DllReferencePlugin that the manifest file does not exist yet, it only cares for its path.

Plugin.js:44

keys(dllConfig.entry)
      .map(getManifestPath(settings.hash))
      .forEach(manifestPath => {
        new DllReferencePlugin({
          context: context,
          manifest: manifestPath
        }).apply(compiler);
      });

// manifestPath looks somthing like:
// project/node_modules/.cache/autodll-webpack-plugin/development_instance_0_751bd3d7d750aeacee325e450487ec40/[bundleName].manifest.json

When I think about it, the hash doesn't need to be created so early at all. The cache sub dir can be just [env]_[id]_[bundleName], and the hash can be just a file in it.

That can solve the problem.

About the way to expose this feature: Yeah, the auto option it one way to go. How about allowing the user to pass a function as an entry?

Like so:

new AutoDllPlugin({
  entry: {
    deps: (pkg) => {
      return Object
          .keys(pkg.dependencies)
          .filter((dep) => !pkg.dllIgnore.contains(dep))
    }
  }
});

I belive the user can have more flexibility that way. We can also allow the user to return a promise:

new AutoDllPlugin({
  entry: {
    deps: () => {
       return readFileAsync('./otherSettingsFile.json')
         .then((settings) => settings.dll)
    }
  }
});

What do you think?

Nargonath commented 7 years ago

Actually I thought of that function option but not enough I guess. It is indeed a good idea because we can still give users the choice about the way they want to handle the dependencies and remove the hassle for us to handle multiple entries config as it is their concern. Plus I have to admit that my original suggestion was not that great. There were too many different cases to handle, yours is way more simple and straightfoward, even for us to handle. We would just need to explain its use in the README. 👍

I'll probably be working on the PR as of next week, if that's ok with you. Should I branch from master or next branch?

datapimp commented 7 years ago

Hi, just discovered this project and wanted to offer some comments. I've built a similar auto-dll plugin which I never got around to publishing, and was able to avoid having to specify any dependencies at all.

One reason I moved away from using the package.json is consider a typical server rendered React app which depends on both express and react. You don't want to bundle these in the same DLL, so relying on the package.json is not ideal.

You can add a config option to ignore things, but what is "auto" about that?

Instead, the route I eventually discovered was using the stats object that gets generated after the first run, and pulling the DLL Entries from the modules property. In there, we will see the the request which inserted that module into the build, and we can use these values to filter and reduce our DLL's entry config.

You can filter these modules by their absolute paths and eliminate anything that is outside of node_modules (or whatever other external packages folders you can pull from the webpack config). You can also filter using a variety of other properties (like if the module failed or had warnings)

The upside of this is you will only have the values that are actually used by your webpack's entry points. And since this plugin works without the DLL plugin on the first run, you will for sure have these values after any successful compilation

datapimp commented 7 years ago

Here's just a sample of one of the module entries. I initially filtered by the depth property of 2, which is where in this particular app you'll begin to see dependencies on external modules. Depth 0 are your entry modules.

{ id: './node_modules/react-dom/index.js',
  identifier: '~/node_modules/react-dom/index.js',
  name: './~/react-dom/index.js',
  index: 150,
  index2: 321,
  size: 59,
  cacheable: true,
  built: true,
  optional: false,
  prefetched: false,
  chunks: [ 0 ],
  assets: [],
  issuer: './src/index.web.js',
  issuerId: './src/index.web.js',
  issuerName: './src/index.web.js',
  profile: undefined,
  failed: false,
  errors: 0,
  warnings: 0,
  reasons:
   [ { moduleId: './src/index.web.js',
       moduleIdentifier: './src/index.web.js',
       module: './src/index.web.js',
       moduleName: './src/index.web.js',
       type: 'cjs require',
       userRequest: 'react-dom',
       loc: '5:16-36' } ],
  usedExports: true,
  providedExports: null,
  depth: 2 }

So if you find all of the module objects where the issuer is part of your project, and where the module is an external module ( most likely anything in node_modules/ but webpack is configurable for any folder, so we can't just assume that )

You'll want the unique set of userRequest values found in reasons -- that is your DLL entry config.

This seems too easy which is why I hesitated to release it, but for all of my projects it seems to work amazingly.

Good luck.

asfktz commented 7 years ago

I'll probably be working on the PR as of next week, if that's ok with you.

That will be awesome, @Nargonath!

Btw, I saw your comment on https://github.com/webpack/webpack/issues/5617 And it made me very happy (:

I care much more about the idea behind the plugin than I care for my implementation of it. If you can make something even better, I'll be happy to see it.

Should I branch from master or next branch?

I believe it's better to continue from the next branch. I added integration tests in it, which turns out to be very useful.

The next branch also includes 2 new features:

Both related to https://github.com/asfktz/autodll-webpack-plugin/issues/37. As I described in that issue, I encountered some trouble to support plugins and loaders, but the branch can be shipped as long as these two features stay undocumented.

Let me know if you need anything

asfktz commented 7 years ago

Thanks for the insights! @datapimp That's actually a very interesting approch.

Can you tell us more about the workflow?

If I understand correctly, when you build the project, it creates the DLL after the first run. But if you build the project for the second time, will it use the DLL? Or that by "first run", you mean while using the dev server? If so, what will happen if you add another dependency to your code while the dev server is running?

Thank you so much for sharing this with us, I appreciate it very much (:

datapimp commented 7 years ago

@asfktz yeah that is basically it.

What I did with my plugin was create a folder in node_modules/.cache/${ myPackage.name } and after each compiler run, save key information pulled from the webpack stats for any analysis + comparison step.

I used an md5 hash of the contents of my yarn lockfile, the compiler's target and entry properties to give me a cache key to use as my filename, and saved the module metadata in a JSON file in this cache folder.

On your first compiler run ( I forget off the top of my head which webpack compiler event this is) you know what your cache key will be, so you look for the above cache file and if it exists, read it and get your DLL config.

If it doesn't exist, you run the compiler without the DLL and then generate it when you're done. The next time that event is fired, you'll have a cache hit.

This should in theory give you a truly automatic + zero config DLL.

One thing I know is possible, but don't have any advice on doing elegantly is the sort of hot module replacement for webpack config, so that your webpack dev server can reload without being restarted.

Nargonath commented 7 years ago

@asfktz Don't hesitate to ask if you need help managing the plugin. Thanks for offering your help on the PR.

@datapimp Thanks for sharing your experiences with us. This is quite an interesting approach. I would need to dig a little more to fully understand the ins and outs.

@asfktz I'll try to implement @datapimp's way in my PR. We'll see where it leads us.

asfktz commented 7 years ago

Don't hesitate to ask if you need help managing the plugin.

I'd love that, Thanks!

I've found that amazing talk about writing webpack plugins, take a look at it, very insightful https://www.youtube.com/watch?v=NHI_PhoykVU

Nargonath commented 7 years ago

@asfktz Thank you, I'll check it out.

soederpop commented 7 years ago

@Nargonath do you need help with this?

Nargonath commented 7 years ago

@soederpop Thanks for asking. So far I haven't really worked on it, got soaked up by my personal life and I haven't find as much time as I wanted to work on OSS. Feel free to jump in! We could mutualize our effort to work this out. :+1:

Nargonath commented 6 years ago

@soederpop Did you manage to work something out?

soederpop commented 6 years ago

I haven't had a chance yet; was gonna start right now. Who can I talk to about the way the plugin works right now? to find the optimal place to inject this step. I see there is already a component which handlesStats -- I'm curious what it is for.

Nargonath commented 6 years ago

@soederpop It seems that handleStats write the stats in a json file so it could be consumed by other tools. I'm working on it as well so we can work together. :+1:

asfktz commented 6 years ago

Hi @soederpop! Thanks for willing to contribute 🙏 I'll be happy to answer any questions regarding the current design.

I'll write you later today about handlesStats and a few more important things, right now I'm late for work 😄

borisirota commented 6 years ago

Hi guys, I read the discussion - great direction 👍

One thing to keep in mind though, it's important having the ability to exclude specific node_modules from the automatic bundle - few reasons:

  1. node_modules using webpack internals have to be in the main bundle:

  2. node_modules which has to be transformed like scss, less, etc. and in some use cases need to be included in the main bundle (use case I have).

I guess that having the ability to include modules outside the node_modules directory has some benefits as well.

Thanks for great and very useful plugin 😄

asfktz commented 6 years ago

I see there is already a component which handlesStats -- I'm curious what it is for.

@soederpop handleStats ensure that the stats object is available stats is the object we get when running the compiler:

const compiler = webpack(config);
compiler.run((err, stats) => {
   console.log(stats)
});

it contains a lot of useful information about the build.

The catch is that the plugin tries to cache the DLL and only build it when necessary.

Here for example: https://github.com/asfktz/autodll-webpack-plugin/blob/bc2a2e8af94ea833f3cb9ea56cab1479bdc1ecb1/src/plugin.js#L61-L76

(excuse the silly .then((a) => { return a; }), I forgot to delete it)

On every run (regular build) and watch-run (rebuild by webpack-devserver) We call compileIfNeeded which, as the name says, compile only if needed. So in case of a fresh build, it easy, we can just return the stats object.

What handeStats does is just trying to retrieve the stats when there the build is skipped. It will first try to see if it stored in the memory - that will happen when webpack-dev-server triggers a rebuild. If not, it will try to get it from the file system. It will also store the stats in the FS on every fresh build.

Currently, the plugin doesn't do much with the stats other than, as @Nargonath said, exposing it to other plugin developers. for example, @faceyspacey it needed since his plugin webpack-flush-chunks, reads the stats to determine which assets should be included in the html, but since we build the DLL separately he had no access to it.

Another use for the stats is for testing, I saw other plugins run snapshot testing on it.

That's the main idea.

@soederpop @Nargonath Please ask me about anything that's unclear to you, I'll do my best to answer.

And when you see things that you believe that can be better simplified, please raise it up, the current design is not set in stone.

asfktz commented 6 years ago

Thanks for the insights, @borisirota! We will 👍