How does this compare to @dqbd/tiktoken

transitive-bullshit commented 1 year ago

For reference, I previously tested a bunch of Node.js tokenizers for accuracy and perf here: https://github.com/transitive-bullshit/compare-tokenizers

cc @dqbd

Thanks! 🙏

dqbd commented 1 year ago

Disclaimer: I'm working on https://github.com/dqbd/tiktoken

Hi! I have extended the compare-tokenizers benchmark to include tiktoken-node and directly compare @dqbd/tiktoken against tiktoken-node:

Tested on M1 Pro (arm64), 16 GB memory, Node v19.8.1.

(index)	Task Name	Average Time (ms)	Variance (ms)
0	'gpt3-tokenizer'	27444	78653
1	'gpt-3-encoder'	16506	62548
2	'@dqbd/tiktoken gpt2'	4194	626
3	'tiktoken-node gpt2'	3923	53
4	'@dqbd/tiktoken text-davinci-003'	4117	58
5	'tiktoken-node text-davinci-003'	3840	61

(Reordered @dqbd/tiktoken and tiktoken-node for clarity, PR can be found here: https://github.com/transitive-bullshit/compare-tokenizers/pull/1)

As we can see, tiktoken-node can be faster than @dqbd/tiktoken-node, but, as far as I have measured, not 5-6x faster, as claimed (https://github.com/openai/tiktoken/issues/22#issuecomment-1472901919).

I've considered some other cases as well, in case I've missed something:

What if we create a new instance for every iteration (out of 25)?

{
  label: '@dqbd/tiktoken gpt2',
  encode: (i: string) => {
    const tiktokenGpt2 = get_encoding('gpt2')
    const result = tiktokenGpt2.encode(i)
    tiktokenGpt2.free()
    return result
  },
},
{
  label: 'tiktoken-node gpt2',
  encode: (i: string) => {
    const tiktokenNode = TiktokenNode.getEncoding('gpt2')
    return tiktokenNode.encode(i)
  },
}

(index)	Task Name	Average Time (ms)	Variance (ms)
0	'@dqbd/tiktoken gpt2'	227556	49932
1	'@dqbd/tiktoken text-davinci-003'	219893	12080
2	'tiktoken-node gpt2'	287464	241397
3	'tiktoken-node text-davinci-003'	303615	690935

What if we add a much larger fixture in the test suite?

fixtures.push(fixtures[fixtures.length - 1].repeat(100))

(index)	Task Name	Average Time (ms)	Variance (ms)
0	'@dqbd/tiktoken gpt2'	240056	33911
1	'@dqbd/tiktoken text-davinci-003'	238467	58049
2	'tiktoken-node gpt2'	231459	114094
3	'tiktoken-node text-davinci-003'	228842	126852

Update 9/4/2023: Ran the tests again with iterations: 25

Maybe I'm missing something else here? Would it be possible to share and compare benchmarks as well, @ceifa?

Functionality wise, @dqbd/tiktoken supports more environments (Edge Functions - with ./lite we even fit within the 1 MB limit) and platforms (browsers - only WASM will be supported). There is some merit in the NAPI approach though, as parallelisation is actually supported in NAPI, thus it might be useful to dig deeper into it.

ceifa commented 1 year ago

Hey @dqbd 👋 I made my benchmark in a simple project I was building, so yes, my result can be very innacurate. Indeed your project have much more features and is much more maintained than mine!

Thanks for adding my project on @transitive-bullshit benchmark, so I can try to improve it with things like parallelism, as you said. I think we can work together to delivery the best of the two worlds, since my project can be faster sometimes, and your project can be ran anywhere.

ceifa / tiktoken-node

How does this compare to @dqbd/tiktoken #12