gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
75.6k stars 7.52k forks source link

server: Rebuilds on new page creation on very big sites take a lot of time #12095

Open TiGR opened 8 months ago

TiGR commented 8 months ago

We have an issue where creation and edits of a new empty page results in rebuild that takes 5-20 seconds. I've tried to looking through the debug log, and have found out that hugo dumps things like this:

DEBUG Direct dependencies of "/news/test" (*hugolib.pageState page-2) =>

Where /news/test is some sort of heavy page, containing a lot of references to another pages and pagination. But. It has no references to the newly created page, neither direct nor in form of lists.

I've recreated a simple setup to get the same results. So, the content structure is like this:

content/
├── _index.md
└── news
    ├── _index.md
    └── test
        └── _index.md

I create new page as content/news/2024/02/1/index.md. This page is completely unrelated to news/test, but when you create that file, I get this in debug:

DEBUG Received System Events: [CREATE        "[projetcRoot]/content/news/2024/02/1/index.md" CHMOD         "[projetcRoot]/content/news/2024/02/1/index.md"]

Change detected, rebuilding site (#2).
2024-02-21 13:02:30.466 +0300
DEBUG cachebuster: Matching "content/news/2024/02/1/index.md" with source "(postcss|tailwind)\\.config\\.js": no match
Source changed /news/2024/02/1/index.md
DEBUG Direct dependencies of "/news/test" (*hugolib.pageState page-4) =>

news/test should not have been mentioned at all.

So, rebuilds take more time, since hugo rebuilds completely unrelated pages.

The issue is reproducible on 0.123.1 too, but it is not there in 0.122.

bep commented 8 months ago

Note that for new content pages, we take a sample of surrounding pages (e.g. in the same section), which I suspect is what you see. So, v0.122.0 might have done this faster, but there were lots of stale/not updated content.

This is, however, not something I have seen as a problem, but I welcome any well thought out fixes and improvements in this area.

TiGR commented 8 months ago

Just a simple edit of a single news article (without creating a new page):

hugo-sshot

bep commented 8 months ago

Yea, well, looking at your other stats (400s total build time, 80k images, but just 6-7Kx3 content pages) tells me that there's more to this story, but this is not my site, so I cannot tell. If you could do a screenshot with hugo server --logLevel info, and post a screenshot of the rebuild, that may help.

This is the rebuild on content change for the Hugo docs repo (800 content pages):

image
TiGR commented 8 months ago

Single file change with --logLevel info:

Source changed /news/2024/02/121547/index.md
INFO  build:  step process substep resolve page output change set changes 1 checked 36889 matches 442624 duration 87.7978ms
INFO  build:  step process substep gc dynacache duration 43.4158ms
INFO  build:  step process substep collect files 4 files_total 4 duration 20.0337ms
INFO  build:  step process duration 176.5031ms
INFO  build:  step assemble duration 811.8593ms
INFO  build:  step render substep pages site en outputFormat html duration 5.2544655s
INFO  build:  step render substep pages site en outputFormat csv duration 14.9767ms
INFO  build:  step render substep pages site en outputFormat json duration 26.2060554s
INFO  build:  step render substep pages site en outputFormat rss duration 90.2031ms
INFO  build:  step render substep pages site ru outputFormat html duration 584.0109ms
INFO  build:  step render substep pages site ru outputFormat csv duration 22.8902ms
INFO  build:  step render substep pages site ru outputFormat json duration 16.2382ms
INFO  build:  step render substep pages site ru outputFormat rss duration 18.7267ms
INFO  build:  step render substep pages site de outputFormat html duration 653.7476ms
INFO  build:  step render substep pages site de outputFormat csv duration 31.7475ms
INFO  build:  step render substep pages site de outputFormat json duration 16.3283ms
INFO  build:  step render substep pages site de outputFormat rss duration 20.1423ms
INFO  build:  step render pages 6 content 4514 duration 33.531832s
INFO  build:  step postProcess duration 9.496ms
INFO  build:  duration 34.5699731s
Total in 34570 ms
TiGR commented 7 months ago

I'm trying to investigate this. I have a question: how does hugo generate a list of direct dependencies in debug output? Because I see a lot of images there, for example for a page that uses no images (but scans through pages using .Pages or .RegularPages, picking only some of these). We get something like this:

DEBUG Direct dependencies of "/news/info" (*hugolib.pageState page-11870) =>
Direct dependencies of "/news/events" (*hugolib.pageState page-11878) =>
          __anonymous
          /studies/mango/events/43/img_7712.jpg
          /studies/bunch/events/29/img_2425.jpg
          _default/_markup/render-link.html
          /studies/london/events/24051310
          /studies/paris/events/13/img_7304.jpg
          partials/get_resource.html
          /studies/london
          shortcodes/events_img.html
          /studies/mango/events/43/img_2029.jpg
          /studies/paris/events/13/img_7310.jpg
          shortcodes/resource_img.html
          /studies/paris/events/13/img_7275.jpg
          /studies/paris/events/13/img_7320.jpg
          /
          /studies/mango/events/43/img_1469.jpg
          /studies/berlin/events/4/img_4925.jpg
          partials/ensure_jpg.html
          /studies/mango/events/43/img_1628.jpg
          /studies/paris/events/13/img_7284.jpg

None of these images is used on that page or being a direct reference there. Also, why is there "/"? Index page is not referenced there.

If I comment out layout, everything becomes fast, but if I uncomment local partial defined with {{ define "partials/_our_func" }}, even though there are no actual usages of that partial on that template (or in any template), the rebuild still takes the same long time.

So is there a way to figure out how this list is made up?