Closed peterzhu2118 closed 4 years ago
The benchmark results don't seem right. I would expect parse time to be significantly slower due to having to copy the immediate argument for OP_WRITE_RAW
. Rendering also doesn't seem more efficient, since the string length needs to be decoded. Did you get the benchmark results mixed up?
On master I got this for the benchmark
parse: 156.458 (± 3.2%) i/s - 1.575k in 10.077279s
render: 207.469 (± 5.3%) i/s - 2.080k in 10.058117s
parse & render: 83.494 (± 3.6%) i/s - 840.000 in 10.070280s
and on this branch (rebased) I got
parse: 147.837 (± 4.1%) i/s - 1.484k in 10.053717s
render: 203.815 (± 5.4%) i/s - 2.033k in 10.009205s
parse & render: 80.104 (± 3.7%) i/s - 800.000 in 10.000488s
which seems inline with what I expected.
I was hoping to avoid slowing down rendering with this change. I think we could have a more efficient OP_WRITE_RAW instruction for a byte sized length argument to make rendering more comparable to what we have on master.
Have you tried using a shorter OP_WRITE_RAW instruction to fix the performance regression? How does that affect the benchmark results?
@dylanahsmith I just changed OP_WRITE_RAW
to be 1 byte and added a OP_WRITE_RAW_W
that is 3 bytes. I think it's reduced the performance impact.
This branch after the change:
parse: 160.329 (± 6.2%) i/s - 1.605k in 10.049573s
render: 232.030 (± 7.3%) i/s - 2.318k in 10.050578s
parse & render: 85.336 (± 5.9%) i/s - 856.000 in 10.070809s
This branch before the change:
parse: 156.611 (± 6.4%) i/s - 1.560k in 10.006603s
render: 226.787 (± 7.5%) i/s - 2.266k in 10.054463s
parse & render: 84.311 (± 7.1%) i/s - 840.000 in 10.008604s
Master:
parse: 161.068 (± 6.2%) i/s - 1.605k in 10.000922s
render: 235.662 (± 6.8%) i/s - 2.346k in 10.003183s
parse & render: 86.112 (± 5.8%) i/s - 864.000 in 10.075406s
Rendering time seems almost unaffected w/ the wide instruction. I think we should be "ignoring" the parse speed when benchmarking since the goal is to cache the serialized template.
Moving to wide instructions is effectively moving an operation from rendering to the parsing phase. So parse time is affected. But that will be gone when compiled templates are serialized. So we should be moving as much as possible to the parsing phase if it improves rendering time.
Implement
OP_WRITE_RAW
as an immediate instruction. The current implementation stores the size as a 24 byte unsigned integer after the instruction and thensize
number of bytes following it is the string. I also added aOP_WRITE_RAW_SKIP
instruction that skips skips the string. This instruction is used inBlockBody#remove_blank_strings
to replaceOP_WRITE_RAW
instruction. I'm not entirely satisfied in adding an extra instruction to handle this corner case. I've thought of two other ways to implement this:OP_WRITE_RAW_SKIP
instruction but will require an extra 8 bytes.c_buffer
that allows deleting sections from the middle. This is probably the cleanest solution but will have performance overhead ofmemmove
.Also see #77.
Benchmarks
Base branch:
This branch: