Open mmastrac opened 1 year ago
@mmastrac It seems this has been fixed in main
?
test bench_op_string_large_utf8_1000000 ... bench: 15,772,187 ns/iter (+/- 462,927)
...
test bench_op_string_old_large_utf8_1000000 ... bench: 20,796,803 ns/iter (+/- 354,834)
The UTF8 one is faster with op2, but for some reason the ASCII one is not. I think the benchmark has improved on main
but is still slower (I think ~50%?).
Trimmed recent benchmark:
test bench_op_string_large_1000000 ... bench: 790,843 ns/iter (+/- 28,126)
test bench_op_string_old_large_1000000 ... bench: 471,671 ns/iter (+/- 70,252)
I wonder if we're just falling off some SIMD/autovectorization fast path?
Ah ok, here's the profile for each one:
bench_op_string_large_1000000
- https://share.firefox.dev/44fuKRX
bench_op_string_old_large_1000000
- https://share.firefox.dev/44WKIBf
It seems the fast call path is not taken in either of the cases and the other difference is that the old one uses WriteUtf8 whereas we use WriteOneByte
op2
strings are faster in every case other than LARGE_1000000 (1,000,000 ASCII characters). We need to investigate why.