ankane / tokenizers-ruby

Fast state-of-the-art tokenizers for Ruby
Apache License 2.0
132 stars 6 forks source link

Using tiktoken #12

Closed ScotterC closed 1 year ago

ScotterC commented 1 year ago

Hey thanks for writing up this gem! (and all the other ML ones).

This may be a naive question but can tiktoken be invoked as a tokenizer?

ankane commented 1 year ago

tiktoken is another tokenization library (rather than a specific encoding/model). You can use either for GPT-2 tokenization (but looks like tiktoken supports a few others that aren't on the Hugging Face hub yet).