Open daulet opened 1 year ago
We've regressed in benchmarks quite a bit from initial release.
benchstat benchmarks/3188ded27885d1002698a0e25f0b32306c430e88.txt benchmarks/$(git rev-parse HEAD).txt goos: darwin goarch: arm64 pkg: github.com/daulet/tokenizers │ benchmarks/3188ded27885d1002698a0e25f0b32306c430e88.txt │ benchmarks/38a9a14c1c56b113461b0c7350c72de949e23cc2.txt │ │ sec/op │ sec/op vs base │ EncodeNTimes-10 11.99µ ± 3% 13.11µ ± 1% +9.39% (p=0.002 n=6) EncodeNChars-10 2.584n ± 8% 2.989n ± 272% ~ (p=0.485 n=6) DecodeNTimes-10 1.701µ ± 3% 4.535µ ± 2% +166.66% (p=0.002 n=6) DecodeNTokens-10 193.6n ± 10% 656.1n ± 3% +238.78% (p=0.002 n=6) geomean 317.8n 584.3n +83.86% │ benchmarks/3188ded27885d1002698a0e25f0b32306c430e88.txt │ benchmarks/38a9a14c1c56b113461b0c7350c72de949e23cc2.txt │ │ B/op │ B/op vs base │ EncodeNTimes-10 84.00 ± 0% 232.00 ± 0% +176.19% (p=0.002 n=6) EncodeNChars-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ DecodeNTimes-10 96.00 ± 0% 96.00 ± 0% ~ (p=1.000 n=6) ¹ DecodeNTokens-10 7.000 ± 0% 7.000 ± 0% ~ (p=1.000 n=6) ¹ geomean ² +28.91% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ benchmarks/3188ded27885d1002698a0e25f0b32306c430e88.txt │ benchmarks/38a9a14c1c56b113461b0c7350c72de949e23cc2.txt │ │ allocs/op │ allocs/op vs base │ EncodeNTimes-10 4.000 ± 0% 12.000 ± 0% +200.00% (p=0.002 n=6) EncodeNChars-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ DecodeNTimes-10 3.000 ± 0% 3.000 ± 0% ~ (p=1.000 n=6) ¹ DecodeNTokens-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ geomean ² +31.61% ² ¹ all samples are equal ² summaries must be >0 to compute geomean
CC @clems4ever @RJKeevil in case you'd be interesting in looking into this.
I actually root caused it to this commit in the upstream library.
We've regressed in benchmarks quite a bit from initial release.