daulet / tokenizers

Go bindings for HuggingFace Tokenizer
MIT License
89 stars 23 forks source link

Update to huggingface/tokenizers v0.20.0 #23

Closed daulet closed 3 months ago

daulet commented 3 months ago

The v0.20.0 release performance improvements are negligible:

benchstat test/benchmark/7bb47dd52e68ae3349c0461d494921d6a07f7181.txt test/benchmark/1b502b65573ea00125eac62fa301c480402be19c.txt
goos: darwin
goarch: arm64
pkg: github.com/daulet/tokenizers
                 │ test/benchmark/7bb47dd52e68ae3349c0461d494921d6a07f7181.txt │ test/benchmark/1b502b65573ea00125eac62fa301c480402be19c.txt │
                 │                           sec/op                            │                sec/op                 vs base               │
EncodeNTimes-10                                                   12.66µ ±  1%                            12.68µ ± 4%       ~ (p=0.579 n=10)
EncodeNChars-10                                                   2.264n ± 37%                            2.185n ± 9%       ~ (p=0.739 n=10)
DecodeNTimes-10                                                   4.506µ ±  2%                            4.465µ ± 1%       ~ (p=0.138 n=10)
DecodeNTokens-10                                                  649.2n ±  3%                            655.2n ± 2%       ~ (p=0.105 n=10)
geomean                                                           538.1n                                  533.5n       -0.85%

                 │ test/benchmark/7bb47dd52e68ae3349c0461d494921d6a07f7181.txt │ test/benchmark/1b502b65573ea00125eac62fa301c480402be19c.txt │
                 │                            B/op                             │                B/op                 vs base                 │
EncodeNTimes-10                                                   232.0 ± 0%                             232.0 ± 0%       ~ (p=1.000 n=10) ¹
EncodeNChars-10                                                   0.000 ± 0%                             0.000 ± 0%       ~ (p=1.000 n=10) ¹
DecodeNTimes-10                                                   96.00 ± 0%                             96.00 ± 0%       ~ (p=1.000 n=10) ¹
DecodeNTokens-10                                                  7.000 ± 0%                             7.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                                      ²                                       +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                 │ test/benchmark/7bb47dd52e68ae3349c0461d494921d6a07f7181.txt │ test/benchmark/1b502b65573ea00125eac62fa301c480402be19c.txt │
                 │                          allocs/op                          │             allocs/op               vs base                 │
EncodeNTimes-10                                                   12.00 ± 0%                             12.00 ± 0%       ~ (p=1.000 n=10) ¹
EncodeNChars-10                                                   0.000 ± 0%                             0.000 ± 0%       ~ (p=1.000 n=10) ¹
DecodeNTimes-10                                                   3.000 ± 0%                             3.000 ± 0%       ~ (p=1.000 n=10) ¹
DecodeNTokens-10                                                  0.000 ± 0%                             0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                                      ²                                       +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean