Performance: Steep decline between v2 and v5

Dynom commented 4 years ago

I've been doing several proof of concepts and the performance of v5's loading of templates is significantly less than of the v2 branch:

BenchmarkJets
BenchmarkJets/Jetv2
BenchmarkJets/Jetv2-8              48260         24372 ns/op        5040 B/op         87 allocs/op
BenchmarkJets/Jetv5
BenchmarkJets/Jetv5-8               3828        267862 ns/op       36687 B/op        456 allocs/op

The setup

Versions:

github.com/CloudyKit/jet v2.1.2+incompatible
github.com/CloudyKit/jet/v5 v5.1.1

The implementation and templates are identical, with the exception of some differences required for the different Jet versions:

$ find . -type f -name "*.jet" -print -exec diff -Naruw ../jetv2/{} ../jetv5/{} \;

./views/leaderboard.jet
--- ../jetv2/./views/leaderboard.jet    2020-10-20 10:18:04.000000000 +0200
+++ ../jetv5/./views/leaderboard.jet    2020-10-20 12:55:05.000000000 +0200
@@ -1,8 +1,8 @@
-{{import "partials/leaderboard_row.jet"}}
+{{import "/partials/leaderboard_row.jet"}}

 <table>
     <!-- Control flow is a must. This includes iterating over arrays, if/elseif/else statements and ternaries. -->
-    {{range competitor := .Leaderboard}}
+    {{range _, competitor := .Leaderboard}}
     <!-- comments do not appear in the rendered HTML -->
         {* `$competitor` is of type `User` *}
         {{yield leaderboard_row(user=competitor)}}
./views/index.jet
--- ../jetv2/./views/index.jet  2020-10-20 10:18:04.000000000 +0200
+++ ../jetv5/./views/index.jet  2020-10-20 12:45:10.000000000 +0200
@@ -1,6 +1,6 @@
-{{extends "layouts/main.jet"}}
+{{extends "/layouts/main.jet"}}

-{{import "partials/ui.jet"}}
+{{import "/partials/ui.jet"}}

 {{block body()}}
     <script type="text/javascript">
./views/layouts/main.jet
./views/layouts/header.jet
--- ../jetv2/./views/layouts/header.jet 2020-10-26 09:09:56.000000000 +0100
+++ ../jetv5/./views/layouts/header.jet 2020-10-20 12:45:03.000000000 +0200
@@ -1,4 +1,4 @@
-{{import "partials/ui.jet"}}
+{{import "/partials/ui.jet"}}

 <div id="header">
     <div id="menu">
./views/layouts/footer.jet
./views/partials/ui.jet
./views/partials/leaderboard_row.jet

The files:

.
├── jet.go
└── views
    ├── index.jet
    ├── layouts
    │   ├── footer.jet
    │   ├── header.jet
    │   └── main.jet
    ├── leaderboard.jet
    └── partials
        ├── leaderboard_row.jet
        └── ui.jet

The benchmark

import (
    jetversion2 "github.com/CloudyKit/jet"
    jetversion5 "github.com/CloudyKit/jet/v5"
    // ..
)

func BenchmarkJets(b *testing.B) {
    if wd, err := os.Getwd(); err == nil && filepath.Base(wd) == "template" {
        _ = os.Chdir("..")
    }

    tr := reflect.ValueOf(template.TranslateFunc("en"))
    data := template.ExampleContext()

    b.Run("Jetv2", func(b *testing.B) {

        views := jetversion2.NewHTMLSet("./template/jetv2/views")
        views.SetDevelopmentMode(false)

        for i := 0; i < b.N; i++ {
            t, err := views.GetTemplate("index.jet")
            if err != nil {
                b.Errorf("Error in benchmark %s", err)
            }

            err = t.Execute(ioutil.Discard, jetversion2.VarMap{"T": tr}, data)
            if err != nil {
                b.Errorf("Error in benchmark %s", err)
            }
        }
    })

    b.Run("Jetv5", func(b *testing.B) {

        views := jetversion5.NewHTMLSet("./template/jetv5/views")
        views.SetDevelopmentMode(false)
        views.SetExtensions([]string{""}) // v5 specific optimisation

        for i := 0; i < b.N; i++ {
            t, err := views.GetTemplate("index.jet")
            if err != nil {
                b.Errorf("Error in benchmark %s", err)
            }

            err = t.Execute(ioutil.Discard, jetversion5.VarMap{"T": tr}, data)
            if err != nil {
                b.Errorf("Error in benchmark %s", err)
            }
        }
    })
}

Considerations

I noticed quite a bit of time spent in io (Loader.Exists). I wrote a quick-n-dirty Loader that pre-fetches all *.jet files and adds them to a map for faster subsequent lookups. This reduces the execution time to 17 ms (down from 267ms). But I think it only masks the problem, instead of properly solving it.

I noticed many invocations to Loader.Exists and Loader.Load() for already observed files. My guess so far is that some state should be preserved in func (st *Runtime) executeList when st.executeInclude() is invoked. Or that the way file paths are canonicalised should be more in line with how templates can be used in version 5. e.g.:

{{ include "./index.jet" }}
{{ include "/index.jet" }}
{{ include "index.jet" }}

might all point to the same file.

Screenshot 2020-10-26 at 16 38 55 Screenshot 2020-10-26 at 16 40 21

Further findings

After I've plugged a different Loader I managed to get it down to the following. However I think this isn't the approach to take. I think the lazy-loading approach is probably a fine route to take, but right now it appears to not resolving efficiently.

BenchmarkJets
BenchmarkJets/Jetv2
BenchmarkJets/Jetv2-8              49384         23685 ns/op        5041 B/op         87 allocs/op
BenchmarkJets/Jetv5
BenchmarkJets/Jetv5-8              69686         17440 ns/op        5091 B/op         72 allocs/op

sauerbraten commented 4 years ago

Thanks for the detailed report!

I opened #182, which I think fixes this. It would be great if you could run your benchmark on that and report back if this improves the v5 results?

If you like, we can also incorporate your benchmark setup in this repository for future reference!

Dynom commented 4 years ago

Hi,

Thanks for your fast reply!

I've looked at your PR and placed a few comments. I'm not familiar enough with Jet to give any real feedback however.

I'm not entirely sure if a faster loader is the best solution, I think that preventing the lookups would be an even better approach. If anything the relatively slow disk I/O might help to illustrate the problem quite nicely.

sauerbraten commented 4 years ago

I didn't fix it using a faster loader, I eliminated a code path you exposed in your benchmarks using GetTemplate() where the parsed template (AST) was not cached in the Set after parsing, so subsequent calls to the same function had to open the template file and parse it all over again.

I'm pretty sure this is the major difference to v2: v2 cached the template correctly after parsing it once. If you can upload a zip containing you benchmark data, I'll run the benchmark myself to check whether my guess is correct.

Dynom commented 4 years ago

Ah you're right 👌

When running against your branch:

BenchmarkJets
BenchmarkJets/Jetv2_________
BenchmarkJets/Jetv2_________-8             50058         23629 ns/op        5041 B/op         87 allocs/op
BenchmarkJets/Jetv5-sauerbraten
BenchmarkJets/Jetv5-sauerbraten-8          42816         27449 ns/op        6028 B/op        157 allocs/op

I'm not sure if removing the public set.Parse is breaking things for others. I used it in a test to verify correctness and it might be a breaking change for other leaning on it. It's clearly documented that it doesn't store it in cache, which was great for using it in tests.

If this is still the way forward I'll remove it from my tests, it's nothing I heavily lean on right now.

sauerbraten commented 4 years ago

Thanks for verifying my PR, those numbers look ok, even though I'm sure we can still optimize at some point!

Oh, removing those methods is definitely breaking, and I'm not yet sure we will. It's good to know the problem was where I thought it was, though. The thing with Set.Parse() at the moment is that while it will not add the template you parse to the cache, it will cache all templates that are included or extended by the parsed template. I'll have to find a clean way to make sure those will be treated like the original template before re-adding Parse().

summerblue commented 4 years ago

@sauerbraten Can u please create a new tag for this issue?

summerblue commented 4 years ago

FYI —— https://dev.to/kataras/what-s-the-fastest-template-parser-in-go-4bal

Dynom commented 4 years ago

FYI: https://github.com/CloudyKit/jet/issues/178

Please ignore anything from kataras.

sauerbraten commented 3 years ago

@summerblue: definitely, but I want to merge https://github.com/CloudyKit/jet/pull/183 since we now have breaking changes anyway. I'll do that and then prepare the v6 release.

sauerbraten commented 3 years ago

@summerblue: https://github.com/CloudyKit/jet/releases/tag/v6.0.0

CloudyKit / jet