ayushn21 / bridgetown-sitemap

A Bridgetown plugin to generate a sitemap.xml
MIT License
15 stars 4 forks source link

Site with ~50 pages takes 20 seconds to generate sitemap #3

Closed joemasilotti closed 1 year ago

joemasilotti commented 1 year ago

My site constantly takes 20 seconds to generate the XML feed. There are only about 50 pages and I don't think I'm doing anything crazy.

Codebase is open source, here: https://github.com/joemasilotti/masilotti.com

Recent server log when editing a file. Note how the site rebuilds in 0.01 seconds but the sitemap takes over 20.

[Bridgetown]          Reloading… 1 file changed at 2023-02-09 08:38:18
[Bridgetown]                     - src/_posts/2023-02-09-progressively-enhanced-turbo-native-apps-in-the-app-store.md
[Bridgetown]             Done! 🎉 Completed in less than 0.01 seconds.
[Bridgetown]
[Frontend] esbuild: frontend bundling started...
[Frontend] esbuild: frontend bundling complete!
[Frontend] esbuild: entrypoints processed:
[Frontend]          - index.QOLPUBLY.js: 487B
[Frontend]          - index.RPH5C6P5.css: 44.51KB
[Bridgetown]          Reloading… 1 file changed at 2023-02-09 08:38:19
[Bridgetown]                     - .bridgetown-cache/frontend-bundling/manifest.json
[Bridgetown]    Bridgetown Feed: Generating feed for hotwire
[Bridgetown]    Bridgetown Feed: Generating feed for posts
[Bridgetown]         Pagination: disabled. Enable in site config with pagination:\n enabled: true
[Bridgetown]             Done! 🎉 Completed in less than 20.37 seconds.
[Bridgetown]
joemasilotti commented 1 year ago

For now I'm disabling sitemap and feed generation in development via this commit.

commit d1a684609862b0ccdd21347f4983efa30f7933d4
Author: Joe Masilotti <joe@masilotti.com>
Date:   Thu Feb 9 10:10:12 2023 -0800

    Only build sitemap + feed in production

    Because it takes 20+ seconds in dev for every tweak :(

diff --git a/config/initializers.rb b/config/initializers.rb
index f628228..d24cd7c 100644
--- a/config/initializers.rb
+++ b/config/initializers.rb
-  init :"bridgetown-sitemap" 
+  unless Bridgetown.env.development?
+    init :"bridgetown-feed"
+    init :"bridgetown-sitemap"
+  end
diff --git a/src/_components/head.erb b/src/_components/head.erb
index 29c8005..2f12ba8 100644
--- a/src/_components/head.erb
+++ b/src/_components/head.erb
@@ -20,7 +20,9 @@
-<%= feed_meta %>
+<% unless Bridgetown.env.development? %>
+  <%= feed_meta %>
+<% end %>
ayushn21 commented 1 year ago

@joemasilotti Are you still seeing this issue?

I just cloned your repo and enabled the two plugins in development. I'm consistently seeing build times under 5 seconds. I tried bin/bt deploy as well as bin/bt start and then edited some pages.

I've got a website with well over 500 pages (https://fslash42.com) which uses the sitemap plugin as well and I've never noticed slow build times.

I'm using an old Intel mac (2015 spec, 6 years old) so it can't be M1/M2 performance hiding an underlying bug I don't think.

I'm not quite sure why you're facing this issue. Can you give me any more information at all?

joemasilotti commented 1 year ago

The sitemap no longer generates in development. Add BRIDGETOWN_ENV=production to the front of the build to see the long build times.

ayushn21 commented 1 year ago

The sitemap no longer generates in development

I edited the initializers.rb to remove the conditional ... I also edited head.erb to remove the conditional around feed_meta. Basically undid the commit you described above before I tested it.

Also I double checked that the feed and sitemap were being generated in the output.

joemasilotti commented 1 year ago

Weird, I'm consistently getting 10+ seconds. I'm running a 2019 16" MacBook Pro i9.

➜  bridgetown git:(main) BRIDGETOWN_ENV=production bin/bt s
[Bridgetown]           Starting: Bridgetown v1.2.0 (codename "Bonny Slope")
[Server] * Puma version: 5.6.5 (ruby 3.1.2-p20) ("Birdie's Version")
[Server] * Cluster Master PID: 64802
[Server] *      Workers: 4
[Server] *     Restarts: (✔) hot (✖) phased
[Server] * Preloading application
[Bridgetown]        Environment: production
[Bridgetown]             Source: /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/src
[Bridgetown]        Destination: /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/output
[Bridgetown]     Custom Plugins: /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/plugins
[Bridgetown]         Generating…
[Server] * Listening on http://0.0.0.0:4000
[Server] Use Ctrl-C to stop
[Server] - Worker 0 (PID: 64803) booted in 0.01s, phase: 0
[Server] - Worker 1 (PID: 64804) booted in 0.0s, phase: 0
[Server] - Worker 2 (PID: 64805) booted in 0.0s, phase: 0
[Server] - Worker 3 (PID: 64806) booted in 0.0s, phase: 0
[Bridgetown]    Bridgetown Feed: Generating feed for hotwire
[Bridgetown]    Bridgetown Feed: Generating feed for posts
[Bridgetown]         Pagination: disabled. Enable in site config with pagination:\n enabled: true
[Bridgetown]            esbuild: There was an error parsing your esbuild manifest file. Please check your esbuild manifest file for any errors.
[Bridgetown]             Done! 🎉 Completed in less than 11.43 seconds.
[Bridgetown]
[Bridgetown]     Now serving at: http://localhost:4000
[Bridgetown]                     http://192.168.50.244:4000
[Bridgetown]
ayushn21 commented 1 year ago

Hmmm, ok so this is weird.

When I tested it this morning, I downloaded your repo as a zip instead of running git clone.

Just now, I cloned it using git clone and saw ~10-11s build times. Then I ran rm -rf .git and the build times went down to ~4-5 seconds.

Any idea why that might be?

ayushn21 commented 1 year ago

I've tracked down the problem. I use git to determine the date the file was last edited on:

      def latest_git_commit_date
        return nil unless git_repo?

        date = `git log -1 --pretty="format:%cI" "#{path}"`
        Time.parse(date) if date.present?
      end

That's causing the slow build time in your repo .... Let me have think about how I can fix this .... if you have any ideas, please share!

joemasilotti commented 1 year ago

Interesting! I'd be happy to have to explicitly set a last updated date in the front matter that defaults to the publish date.

ayushn21 commented 1 year ago

Yeah that'd sort it but you've got rather a lot of blog posts so it'd be quite tedious, not great for DX ..... I'll see if I can use the Bridgetown cache to avoid running that command over and over again.

Let's leave setting it explicitly as a last resort.

ayushn21 commented 1 year ago

I spent some more time investigating this. It's not the actual git command that's slow, it's the fact that it has to get the date from disk. Even if I write it to the Bridgetown cache, it makes no difference to performance whatsoever because it has to read it off the disk for every file.

There's not a lot I can do here as this is the way the plugin works. I'd recommend continuing with your approach of disabling it in development. I don't think you need to disable the bridgetown-feed plugin as well, just bridgetown-sitemap will do.

Setting the last modified date explicitly should sort this out as the data will already be in memory.

I'm going to close this for now, but I'll think about a work around for this in development. Maybe there's a configuration setting of some sort I can allow the user to set ...... I'll have a ponder.

ayushn21 commented 1 year ago

OMFG I'm such a moron.

I've been testing with bin/bt deploy which clears the bloody cache before building!

Re-tested with bin/bt build and I see a 3 second improvement (~7s down to ~4s) in build times when caching the last modified date. I've just pushed the change to main.

Could you please point your repo to the main branch of this plugin and tell me if you see an improvement in build times? First one will be slow as it needs to write the cache.

joemasilotti commented 1 year ago

After a clean I saw 12.76 seconds followed by 6.78 seconds on a cached build.

➜  bridgetown git:(main) bin/bt clean
bin/bt s           Cleaner: Removing /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/output...
           Cleaner: Nothing to do for /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/.bridgetown-metadata.
           Cleaner: Removing /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/.bridgetown-cache...
           Cleaner: Nothing to do for /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/.bridgetown-cache/frontend-bundling.
➜  bridgetown git:(main) ✗ bin/bt s
[Bridgetown]           Starting: Bridgetown v1.2.0 (codename "Bonny Slope")
[Server] * Puma version: 5.6.5 (ruby 3.1.2-p20) ("Birdie's Version")
[Server] * PID: 14844
[Server] * Listening on http://0.0.0.0:4000
[Server] Use Ctrl-C to stop
[Frontend] touch frontend/styles/jit-refresh.css
[Frontend] yarn run esbuild-dev
[Frontend] yarn run v1.22.19
[Frontend] $ node esbuild.config.js --watch
[Bridgetown]        Environment: development
[Bridgetown]             Source: /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/src
[Bridgetown]        Destination: /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/output
[Bridgetown]     Custom Plugins: /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/plugins
[Bridgetown]         Generating…
[Bridgetown]    Bridgetown Feed: Generating feed for hotwire
[Bridgetown]    Bridgetown Feed: Generating feed for posts
[Bridgetown]         Pagination: disabled. Enable in site config with pagination:\n enabled: true
[Frontend] esbuild: frontend bundling started...
[Bridgetown]            esbuild: There was an error parsing your esbuild manifest file. Please check your esbuild manifest file for any errors.
[Frontend] esbuild: frontend bundling complete!
[Frontend] esbuild: entrypoints processed:
[Frontend]          - index.QOLPUBLY.js: 487B
[Frontend]          - index.JP4YQD63.css: 44.88KB
[Bridgetown]             Done! 🎉 Completed in less than 12.76 seconds.
[Bridgetown]
[Bridgetown]     Now serving at: http://localhost:4000
[Bridgetown]                     http://10.50.100.86:4000
[Bridgetown]
^C[Server] - Gracefully stopping, waiting for requests to finish
[Server] === puma shutdown: 2023-04-04 09:51:04 -0700 ===
[Server] - Goodbye!
[Bridgetown] Stopping auxiliary processes...
➜  bridgetown git:(main) ✗ bin/bt s
[Bridgetown]           Starting: Bridgetown v1.2.0 (codename "Bonny Slope")
[Server] * Puma version: 5.6.5 (ruby 3.1.2-p20) ("Birdie's Version")
[Server] * PID: 15119
[Server] * Listening on http://0.0.0.0:4000
[Server] Use Ctrl-C to stop
[Frontend] touch frontend/styles/jit-refresh.css
[Frontend] yarn run esbuild-dev
[Frontend] yarn run v1.22.19
[Frontend] $ node esbuild.config.js --watch
[Frontend] esbuild: frontend bundling started...
[Bridgetown]        Environment: development
[Bridgetown]             Source: /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/src
[Bridgetown]        Destination: /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/output
[Bridgetown]     Custom Plugins: /Users/joemasilotti/workspace/projects/masilotti.com/bridgetown/plugins
[Bridgetown]         Generating…
[Bridgetown]    Bridgetown Feed: Generating feed for hotwire
[Bridgetown]    Bridgetown Feed: Generating feed for posts
[Bridgetown]         Pagination: disabled. Enable in site config with pagination:\n enabled: true
[Frontend] esbuild: frontend bundling complete!
[Frontend] esbuild: entrypoints processed:
[Frontend]          - index.QOLPUBLY.js: 487B
[Frontend]          - index.JP4YQD63.css: 44.88KB
[Bridgetown]             Done! 🎉 Completed in less than 6.78 seconds.
[Bridgetown]
[Bridgetown]     Now serving at: http://localhost:4000
[Bridgetown]                     http://10.50.100.86:4000
[Bridgetown]
ayushn21 commented 1 year ago

Ok sweet! That's a significant improvement I reckon. I doubt you'd improve much on that even after disabling the plugin?

You happy for me to close this and tag a bugfix release with this fix?

joemasilotti commented 1 year ago

To be honest, I probably still won't use the plugin in development. Even with this speed improvement.

My workflow is often changing a single CSS class and waiting for the live-reload to do its thing. Waiting 6 seconds for every tweak breaks that flow entirely.

I think I'd prefer a script that writes these to the front matter based on the git tag. Assuming that would drop build time to less than 1 second.

That said, feel free to release this and say my request is out of scope! I'd totally understand.

ayushn21 commented 1 year ago

My workflow is often changing a single CSS class and waiting for the live-reload to do its thing. Waiting 6 seconds for every tweak breaks that flow entirely.

Yeah for sure, still a rather annoying delay ...

I think I'd prefer a script that writes these to the front matter based on the git tag. Assuming that would drop build time to less than 1 second.

Hmmm, good idea, but definitely out of scope for the plugin itself, although might be something I look at separately at some point. Are you seeing sub-1 second build times with the plugin disabled in dev though?

That said, feel free to release this

Cool, I'll release this as it's an improvement anyway and ponder how else I can improve this. Maybe it'd be worth allowing a config setting to skip generation in development. It's not something that's needed in dev anyway ... I'll have a think.

joemasilotti commented 1 year ago

Are you seeing sub-1 second build times with the plugin disabled in dev though?

With both plugins disabled build times are about 3 seconds. The "under one second" was about as much as I'd like to add to this time. Ideally, everything is under 1 second, though!

And sounds good regarding releasing this.

ayushn21 commented 1 year ago

Fixed by https://github.com/ayushn21/bridgetown-sitemap/commit/d61e5b28b10e7c5df3222df647fa3c8dd64455ac and released in v2.0.1.

ayushn21 commented 1 year ago

With both plugins disabled build times are about 3 seconds. The "under one second" was about as much as I'd like to add to this time. Ideally, everything is under 1 second, though!

Haha yeah that would be ideal! Given that both plugins are iterating over a large number of pages for large sites, I'm not sure there's a lot we can do in terms of performance without going down a deep rabbit hole so definitely in favour of using these in production only.