dividab / tsconfig-paths

Load node modules according to tsconfig paths, in run-time or via API.
MIT License
1.82k stars 102 forks source link

`findFirstExistingPath` is extremely slow #72

Open Swatinem opened 5 years ago

Swatinem commented 5 years ago

We are currently evaluating using tsconfig-paths/register together with @babel/register for a pretty large, mostly js project (slowly migrating to ts)

I have found that both of these register hooks have an extreme overhead. For tsconfig-paths, most of that seems to come from a huge amount of filesystem accesses. Using it delays the startup of our app from ~5s to ~7s, and a profile shows quite clearly where the time is going:

bildschirmfoto von 2019-01-22 13-21-51

I haven’t looked too much into what is exactly happening, but what I can say from the profile is that it apparently probes a ton of files before settling on which one to actually use. I will continue investigating and will update this issue with any findings.

Swatinem commented 5 years ago

So, the problem seems to be this:

https://github.com/dividab/tsconfig-paths/blob/6beaeaf43ffeab1499d1a03e090cac3dc7d96777/src/mapping-entry.ts#L37-L44

tsconfig-paths adds a catch-all rules, and runs its own path resolution algorithm, which apparently is much slower than the one of nodejs itself.

I mean the comment there explains pretty clearly why it does that, but the problem is that it runs the path resolution for all of node_modules, which is super expensive if you have a huge dependency tree. Please at least make this behavior configurable. I only explicitly care about the pathmapping rules that I defined and do not want this register hook to mess with how node_modules are resolved.

jonaskello commented 5 years ago

I think it seems reasonable to make it configurable. It is open for a PR if you want to add it.

Swatinem commented 5 years ago

Thanks, will do :-)

Swatinem commented 5 years ago

just tried commenting out the code and can confirm that removing the match-all cuts the fs.statSync time from ~2s to ~500ms, which indeed solves half of the problem (the other half being actually making this fast)

issacgerges commented 5 years ago

Running into this as well at the moment. Is it possible to invert, or provide configuration to invert the module resolution strategy? So, if specified, allow the original module resolver to be run first, and if it wasn't able to locate the module then run the internal ts-config-paths resolver? Any thoughts @jonaskello? Worth noting this would likely fix the 2nd half of @Swatinem issue, where tslint is incorrectly being resolved.

Swatinem commented 5 years ago

allow the original module resolver to be run first, and if it wasn't able to locate the module then run the internal ts-config-paths resolver?

Hm, in our specific usecase, we want to rewrite dist files (based on their package.json/main field) to src files. So the original module resolver will find my-package just fine (in a lerna monorepo). I just want to rewrite it to use the original source files, combined with ts-node or @babel/register.

Its just that the match-all rule is something that normal node developers just do not expect.

I have also looked at https://www.typescriptlang.org/docs/handbook/module-resolution.html shortly. And I think that the match-all behavior is what the classic resolution algorithm is:

A non-relative import to moduleB such as import { b } from "moduleB", in a source file /root/src/folder/A.ts, would result in attempting the following locations for locating "moduleB":

  1. /root/src/folder/moduleB.ts

That is basically what this match-all rule achieves, plus it uses all the extensions, such as .mjs and .json (leading to #74 )

But when you specify moduleResolution: node it follows nodejs:

Similarly a non-relative import will follow the Node.js resolution logic, first looking up a file, then looking up an applicable folder. So import { b } from "moduleB" in source file /root/src/moduleA.ts would result in the following lookups:

  1. /root/src/node_modules/moduleB.ts

So I have looked it up and we definitely have the wrong moduleResolution setting in our tsconfig, but working on #73 I haven’t seen that tsconfig-paths follows this tsconfig setting either.

tomspeak commented 5 years ago

@Swatinem — I have upgraded to the new version that contains your fix, I am still getting a 7 - 12 second boot up time for a relatively small project that's 98% JavaScript (we're trying to move to TS incrementally).

Without this plugin we'll have to go back to ../../../../util hell, any insight to improve performance would be appreciated.

Swatinem commented 5 years ago

Since I didn’t want this to break anything, the default is to still enable the match-all rule. You have to explicitly disable it: https://github.com/dividab/tsconfig-paths#register

jonaskello commented 5 years ago

Without this plugin we'll have to go back to ../../../../util hell, any insight to improve performance would be appreciated.

Just a thought, have you tried setting it up as a monorepo with tsc --build? This is what I do for most projects these days, and this way you don't need any paths in tsconfig.json since you resolve everything with regular node resolution (and thus there is no need for this package). Of course this may not fit every project so paths is still nice to have, but it might be an option worth investigating if you haven't already done so.

tomspeak commented 5 years ago

Would you mind sharing any links or further expanding on what you suggested? I’d like to investigate it further but our team is new to TS.

So far this is our biggest hurdle trying to transition. Face the slow compile times or extra generation step, or return to path tree madness that we were shielded from via using NODE_PATH! 😞

jonaskello commented 5 years ago

So monorepos is not typescript specific. It just means you have several packages in the same git repo. So you split your app in several npm packages with their own package.json even if you don't intend to publish them because then you can import them using import * as Utils from "@myapp/utils". For this to work you need to create symlinks in the top-level node_modules that link to the code in each package. There are several tools that help with this, I would suggest reading up on yarn workspaces or if you are not using yarn then you can use lerna. Here is an exampe of using typescript with a monorepo (although in this example paths are still used but still it is a good example to start with). You can google "typescript monorepo" for more examples.

Once you have split your app into separate packages, you can make tsc build only the packages that have changed and thus get faster build times (this feature is known as "project references" or tsc --build). Here is one example repo for this approach. There is also this issue which may contain some useful links.

The basic idea is that all packages reference each other through regular node resolution, which is just look upward for node_modules/packagename. And since the symlinks are in place the packages will find each other without having to resort to relative file paths in imports.

tomspeak commented 5 years ago

@jonaskello - thank you, that has been incredibly informative, and the number of tabs I now have open is a testament to the rabbit hole I've been pulled into.

This looks like a great direction to go, and using lerna will fix a few problems we saw on the horizon (shared library for frontend and backend)...!

In the mean time, would you help me run the new performance change? I am using it via ts-node -r tsconfig-paths/register. As someone new to this ecosystem, I'm struggling to see how to pass arguments into this based off the new PR, README and tests. It is not clear to me if it's possible via the CLI or I must now programmatically call it in index.ts.

Thank you.

Swatinem commented 5 years ago

I am using it via ts-node -r tsconfig-paths/register. It is not clear to me if it's possible via the CLI or I must now programmatically call it in index.ts.

You have to create your own register hook, like explained here: https://github.com/dividab/tsconfig-paths#bootstraping-with-explicit-params

tomspeak commented 5 years ago

@Swatinem — thanks! I got it working using that method, and there's no difference in speed for me regardless of what addMatchAll is set to. In the profiler, the same function is taking up the most amount of time for me as your origin screenshot, so it's odd I haven't had any improvement!

Swatinem commented 5 years ago

Hm, I just tested this on our codebase, and with the following ./tools/register.js, I see a reduction from ~2s to ~500ms with the latest update using the script

require('source-map-support').install()
const fsExtra = require('fs-extra')
const path = require('path')
const tsConfigPaths = require('tsconfig-paths')

const baseUrl = path.join(__dirname, '..')
// eslint-disable-next-line
const tsConfig = new Function(`return ${fsExtra.readFileSync(path.join(baseUrl, 'tsconfig.json'), 'utf-8')}`)()

tsConfigPaths.register({
  baseUrl,
  paths: tsConfig.compilerOptions.paths,
  addMatchAll: false,
})
tomspeak commented 5 years ago

We cannot for the life us get this to improve the compilation speed. We have ultimately been forced to back down from TS until we move to the yarn workspaces/monorepo set up so module resolution will work out of box without an additional step.

Thanks a lot for your help!

BJChen990 commented 5 years ago

Hi all, just run into the same problem that the findFirstExistingPath is extremely slow, but by looking at the frame graph, I think there’s also another problem that causes it extremely slow:

TL;DR

Using fs.statSync inside findFirstExistingPath when resolving path make it super slow for programs that uses require() inline. Suggest to fix with caching.

The problem

Using Typescript and postcards with webpack to build our own web application. Most of the part that includes require is super slow, like this part in html-webpack-plugin:

// setup hooks for webpack 4
    if (compiler.hooks) {
      compiler.hooks.compilation.tap('HtmlWebpackPluginHooks', compilation => {
--->    const SyncWaterfallHook = require('tapable').SyncWaterfallHook;
        const AsyncSeriesWaterfallHook = require('tapable').AsyncSeriesWaterfallHook;
        compilation.hooks.htmlWebpackPluginAlterChunks = new SyncWaterfallHook(['chunks', 'objectWithPluginRef']);
        compilation.hooks.htmlWebpackPluginBeforeHtmlGeneration = new AsyncSeriesWaterfallHook(['pluginArgs']);
        compilation.hooks.htmlWebpackPluginBeforeHtmlProcessing = new AsyncSeriesWaterfallHook(['pluginArgs']);
        compilation.hooks.htmlWebpackPluginAlterAssetTags = new AsyncSeriesWaterfallHook(['pluginArgs']);
        compilation.hooks.htmlWebpackPluginAfterHtmlProcessing = new AsyncSeriesWaterfallHook(['pluginArgs']);
        compilation.hooks.htmlWebpackPluginAfterEmit = new AsyncSeriesWaterfallHook(['pluginArgs']);
      });
    }

Reason

Every time the require is called, the Module._resolveFilename is called (which we replaced with our own version) and it uses an fs.statSync!!

For some Js project, there’re still quite a lot of places that put the require in the function body instead of extracting then out to the top of the script, which cause this problem. But still, it is not a good thing that the part called xxxSync is not cached as it would easily become slow.

Screenshots: The parts repeat and take quite a lot of times

Screen Shot 2019-03-14 at 8 45 48 AM

It is stocked by the fs.statSync

Screen Shot 2019-03-14 at 8 47 08 AM

Suggestion

I'm not familiar with the ts-config code base but I think we can introduce cache to the part https://github.com/dividab/tsconfig-paths/blob/master/src/register.ts#L82 maybe something like this:

// Some where as a file variable
const NOT_EXISTS = {};
const __matchPathCache = {};

// Inside register.js
Module._resolveFilename = function(request: string, _parent: any): string {
  const isCoreModule = coreModules.hasOwnProperty(request);
  if (!isCoreModule) {
+   if (request) {
+     found = __matchPathCache[request];
+   } else {
+     found = matchPath(request);
+     __matchPathCache = found || NOT_EXISTS;
+   }
+   if (found && found !== NOT_EXISTS) {
      const modifiedArguments = [found, ...[].slice.call(arguments, 1)]; 
      // Passes all arguments. Even those that is not specified above.
      // tslint:disable-next-line:no-invalid-this
      return originalResolveFilename.apply(this, modifiedArguments);
    }
  }
  // tslint:disable-next-line:no-invalid-this
  return originalResolveFilename.apply(this, arguments);
};
fterradev commented 2 years ago

Running into this as well at the moment. Is it possible to invert, or provide configuration to invert the module resolution strategy? So, if specified, allow the original module resolver to be run first, and if it wasn't able to locate the module then run the internal ts-config-paths resolver? Any thoughts @jonaskello? Worth noting this would likely fix the 2nd half of @Swatinem issue, where tslint is incorrectly being resolved.

~@issacgerges This look like a good idea to me!~

~Having the same slowness issue here.~

fterradev commented 2 years ago

Sorry, actually I just notice this and this, which already fixes my issue! :-)

blake-epistemix commented 1 year ago

Sorry, but what is the fix for this? I have one alias and the project is only 4 files, but this adds several seconds to nodemon. I tried the "register" function, but I either implemented it wrong, or it did not help.

tsconfig.json

...
"paths": {
    "~/*": ["./src/*"]
},
...