Keats / tera2

37 stars 5 forks source link

Improve performance #4

Open Keats opened 1 year ago

Keats commented 1 year ago

The big-table especially is slower than I'd expect.

Keats commented 1 year ago

I'm not entirely sure how to speed it up. I've tried using an iterator, getting by idx everytime and it doesn't move the needle.

jalil-salame commented 1 year ago

How do you do the perf comparison? Do you just run the benches and compare?

A documented workflow would help people who want to contribute

Keats commented 1 year ago

Yep, it's just some criterion bench, run cargo bench big-table and see if it improves

jalil-salame commented 1 year ago

On big-table, tera seems to spend ~30-40% of the time in fmt::write, so I don't think there is much to be done optimizing it further in isolation, maybe trying to reduce the size of Instruction could pay off?

For example, if a String or a Vec will not grow in size, consider using a Box<str> or Box<[T]> instead (2*word size instead of 3).

Maybe also providing a bigger benchmark (a real world template which takes a long time to execute).

Keats commented 1 year ago

I was thinking of using str in Value as well if possible. To be honest I think the big-team template is kind of the worst case template. I will try to add some more realistic benchmark (eg with inheritance, macros, loops, set etc)

Keats commented 1 year ago

More realistic set of templates: https://github.com/Keats/tera2/pull/11

Keats commented 10 months ago

Another thing: we should probably keep Arc in Key since otherwise we clone the string everytime we look up an attribute i believe...?

Keats commented 10 months ago

Another way to speedup things is to do what ramhorns is doing: pre-compute hashes for all struct fields. Ramhorns uses a derive but we could pre-compute the hash when compiling the templates and serializing to Value to avoid the derive. We would need to differentiate between dynamic and static maps/structs though but in practice, i'm guessing most people pass struct to the Context so we should be able to skip most hashmap lookups (well, we can just have a NoHashHasher or something for the HashMap with pre-computed hash).

I don't know when i'll have the time to try that as it can be a potentially big change. What do you think @jalil-salame ?

jalil-salame commented 10 months ago

I need to take a look at the profile and see how much hashing is taking up... My guess is we could make a Cow type but for hashing: either precomputed or lazily computed.

Hashing did not take a significant amount of time, but it could unlock optimizations by reducing the amount of code in the precomputed hash path.

Keats commented 10 months ago

Maybe there are some other optimizations ramhorns is doing but it does beat askama which is compiled to Rust so it's doing things correctly!

Keats commented 8 months ago

@jalil-salame did you end up having a look at it?

jalil-salame commented 8 months ago

@jalil-salame did you end up having a look at it?

Not really, I have been busy :c, I don't think I have time to look at it for the next month or two either...

Keats commented 8 months ago

No worries I barely have time myself