facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
56.9k stars 8.56k forks source link

Files in `exclude` are still processed by the build system, leading to hangs #10708

Open LegNeato opened 4 days ago

LegNeato commented 4 days ago

Have you read the Contributing Guidelines on issues?

Prerequisites

Description

I have a site that uses the classic preset. There is some Rust code in one of the blog directories (blog/2024-11-21-optimizing-matrix-mul/code). I explicitly exclude the Rust code and subdirectories from the blog config:

Snippet: ```ts presets: [ [ "classic", { blog: { exclude: ["*/code/**"], ``` [Full config is on GitHub](https://github.com/Rust-GPU/rust-gpu.github.io/blob/main/docusaurus.config.ts)

Rust puts the target/ output directory with intermediate and compiled artifacts in the top-level (that is, blog/2024-11-21-optimizing-matrix-mul/code/target). After doing a couple of Rust builds (cargo build) and benchmark runs (cargo bench) the target directory is full of many files, some large.

With a full target/ directory, docusaurus stops building the website successfully. Both yarn start and yarn build make some progress and then stall out with node using 100% CPU. I have waited for 20 mins with no forward progress (usually builds are < 10s). Deleting the blog/2024-11-21-optimizing-matrix-mul/code/target directory's many files enables the build to make forward progress and succeed.

Note when I run yarn start and then do a Rust build in the supposedly excluded directory, I can see the client hot-reloading / compiling being kicked off as well.

I know the exclude entry in the config is being respected, as previously I had to put a truncate marker in README.md files in the Rust code (I have missing markers set to throw). After I put the exclude entry in the settings, markdown files in code were correctly not being treated as blog posts and I could remove the truncate markers.

So, it appears that something in the build process is globbing or reading/processing each file and then applying the excludes.

Reproducible demo

https://github.com/Rust-GPU/rust-gpu.github.io

Steps to reproduce

  1. Exclude files in some path in the blog plugin
  2. Run yarn start
  3. touch a file in the excude path
  4. See that a build is triggered

Expected behavior

Build completes.

Actual behavior

Build is triggered. If there are tons of files, node goes to 100% CPU usage during yarn build, no progress on the build. Deleting the files in "exclude" directory allows the build to make progress and finish.

Your environment

Self-service

slorber commented 3 days ago

Can you provide a full repro branch where I can reliably reproduce the problem?

If cargo generates files, then please commit them.

slorber commented 3 days ago

Your exclude overrides the default, maybe the problem could be related:

  include: ['**/*.{md,mdx}'],
  exclude: [
      '**/_*.{js,jsx,ts,tsx,md,mdx}',
      '**/_*/**',
      '**/*.test.{js,jsx,ts,tsx}',
      '**/__tests__/**',
  ]

Maybe try appending to that exclude list instead.

And also try with ** instead of *:

  exclude: [
      '**/_*.{js,jsx,ts,tsx,md,mdx}',
      '**/_*/**',
      '**/*.test.{js,jsx,ts,tsx}',
      '**/__tests__/**',
      '**/code/**',
  ]
LegNeato commented 3 days ago

I can't commit the files as they are large. Better steps:

  1. Checkout https://github.com/Rust-GPU/rust-gpu.github.io
  2. Run yarn start
  3. touch blog/2024-11-21-optimizing-matrix-mul/code/foo
  4. See that a docusaurus rebuild gets triggered

I'll try the other configs, but again I know the blog code at least is using that exclude.

LegNeato commented 3 days ago

I tried the excludes specified above, with "*" and "**". The build is still being triggered.

slorber commented 3 days ago

The problem is that you assume I have cargo installed and that it's safe for me to run random commands of ecosystems I don't know much 😅 all this to help a single user. Cargo is not part of Docusaurus, please don't ask me to install a tool I don't need. I also access a zip hosted anywhere.

What I understand is that this bug only affects HMR, and it gets triggered even by files that are not excluded. I think I see the problem in our getPathsToWatch() lifecycle implementation.

But have you also modified the include option? Because by default it's only supposed to check md/mdx files, not all.

LegNeato commented 3 days ago

I updated the repro, using touch is sufficient (we must have crossed comments!)

LegNeato commented 3 days ago

It does not only affect HMR, it was affecting yarn build as well. HMR is just easier to see as yarn build doesn't error out unless there are a ton of files (or maybe big files? not sure, cargo shoves a bunch of junk in there)

LegNeato commented 3 days ago

I have not modified the include option, there is no include key set in my config.

slorber commented 1 day ago

Thanks, I'll take a look

Note that the problems you see with yarn start and yarn build are likely different. With the current repro I can only troubleshoot that HMR problem.

If you have another problem with yarn build, I'd need another distinct repro for it.

LegNeato commented 20 hours ago

Ok, sounds good! If I have time this week I'll poke around too.