Compile times scale with number of templates

wrapperup commented 1 year ago

Hi, I'm enjoying askama, it's fantastic. I noticed that compile times for templates grow considerably if you have many templates, which I don't think is too surprising considering it's a proc-macro. The main issue is when making incremental changes, askama will recompile all templates, even if they didn't change. This hurts iteration times a bit.

I thought of having templates cached in the user's project (under a .askama directory) without using a build script. But then I saw https://github.com/djc/askama/pull/689, but it doesn't seem to have gone anywhere. Has there been any plans or additional thoughts on that to continue on this work?

IMO, it would be great if it could be done purely in the proc-macro. In my limited testing, just skipping the compile step already sped things up considerably, so it seems like the overhead of invoking and parsing a proc-macro wasn't big enough to notice (testing with about 100 templates). I saw there was some unstable Rust features for tracking files in proc-macros, which might be useful for this, but who knows how long that would take to stabilize.

djc commented 1 year ago

The main issue is when making incremental changes, askama will recompile all templates, even if they didn't change. This hurts iteration times a bit.

I find this surprising. Have you verified this? Do you understand why this is the case?

The work in #689 hasn't gone anywhere as far as I know. I still think the code generation at test time would be the optimal approach, but if you have concrete benchmark numbers to demonstrate the benefits of a caching approach I'd be open to considering that as well.

wrapperup commented 1 year ago

So, my test implements 100 templates in a single module, on Windows 10. Also tried on WSL, and I got similar results.

#[derive(Template)]
#[template(path = "index_001.html")]
struct Index001 {
   test: String,
}

...

#[derive(Template)]
#[template(path = "index_100.html")]
struct Index100 {
   test: String,
}

Making a change to just one these templates causes the compile time to be about ~3.10s. I ran self profile to get a flamegraph, and curiously the proc-macro stage was the longest.

Also in the vanilla case, I stubbed out all of the templates, except for one, and the build time dropped to ~0.40s.

I also stubbed out include_bytes! to see if maybe there was some side-effects with file tracking that may have caused all the macros to run again, but that didn't seem to change anything. (I got similar results to above)

I made a very quick-and-dirty prototype to cache the compiled template (hashes entire proc-macro AST, but not template itself), and the build time dropped to ~0.43s for one template, and about ~1.78s for compiling all the templates.

Flamegraph for one template change with the cache (much better!):

and all:

Honestly, I have no idea why its faster when compiling all templates with cache enabled, that part doesn't really make sense to me. If you're curious, I put the implementation prototype here. It's definitely a draft though: https://github.com/wrapperup/askama/tree/cache-templates

And of course, this wasn't a super scientific test, but I think this should be good enough.

djc commented 1 year ago

Okay, so in a crate with 100 templates what is the time for running cargo check after touching 1 template in the with cache vs the without cache case?

wrapperup commented 1 year ago

I get similar results. Without cache, changing 1 template takes ~2.80s. With cache, it takes ~0.20s for 1, and ~2.82s for all

Here's my test repo: https://github.com/wrapperup/askama-macro-benchmark

djc commented 1 year ago

Okay, let's have a PR for a cache. Can you somehow include the template contents in the freshness calculation?

wrapperup commented 1 year ago

Yep! This needs a bit of cleanup, but I can probably have something soonish.

djc / askama

Compile times scale with number of templates #826