Open Keats opened 1 year ago
I'm not entirely sure how to speed it up. I've tried using an iterator, getting by idx everytime and it doesn't move the needle.
How do you do the perf comparison? Do you just run the benches and compare?
A documented workflow would help people who want to contribute
Yep, it's just some criterion bench, run cargo bench big-table
and see if it improves
On big-table
, tera seems to spend ~30-40% of the time in fmt::write
, so I don't think there is much to be done optimizing it further in isolation, maybe trying to reduce the size of Instruction
could pay off?
For example, if a String
or a Vec
will not grow in size, consider using a Box<str>
or Box<[T]>
instead (2*word size instead of 3).
Maybe also providing a bigger benchmark (a real world template which takes a long time to execute).
I was thinking of using str
in Value
as well if possible.
To be honest I think the big-team template is kind of the worst case template. I will try to add some more realistic benchmark (eg with inheritance, macros, loops, set etc)
More realistic set of templates: https://github.com/Keats/tera2/pull/11
Another thing: we should probably keep ArcKey
since otherwise we clone the string everytime we look up an attribute i believe...?
Another way to speedup things is to do what ramhorns is doing: pre-compute hashes for all struct fields. Ramhorns uses a derive but we could pre-compute the hash when compiling the templates and serializing to Value to avoid the derive. We would need to differentiate between dynamic and static maps/structs though but in practice, i'm guessing most people pass struct to the Context so we should be able to skip most hashmap lookups (well, we can just have a NoHashHasher or something for the HashMap with pre-computed hash).
I don't know when i'll have the time to try that as it can be a potentially big change. What do you think @jalil-salame ?
I need to take a look at the profile and see how much hashing is taking up... My guess is we could make a Cow type but for hashing: either precomputed or lazily computed.
Hashing did not take a significant amount of time, but it could unlock optimizations by reducing the amount of code in the precomputed hash path.
Maybe there are some other optimizations ramhorns is doing but it does beat askama which is compiled to Rust so it's doing things correctly!
@jalil-salame did you end up having a look at it?
@jalil-salame did you end up having a look at it?
Not really, I have been busy :c, I don't think I have time to look at it for the next month or two either...
No worries I barely have time myself
The big-table especially is slower than I'd expect.