bhavnicksm / autotiktokenizer

🧰 The AutoTokenizer that TikToken always needed -- Load any tokenizer with TikToken now! ✨
https://pypi.org/project/autotiktokenizer/
MIT License
12 stars 2 forks source link

Load offline tokenizers with AutoTikTokenizer #10

Closed bhavnicksm closed 1 day ago

bhavnicksm commented 2 weeks ago

AutoTikTokenizer should be able to load offline tokenizer stored at some path on the disk. If it's not able to load the path on the disk it should throw an appropriate error:

For example:

from autotiktokenizer import AutoTikTokenizer

tokenizer = AutoTikTokenizer.from_pretrained("<PATH-TO-TOKENIZER>")

This is in regards to: issue

not-lain commented 1 week ago

first let's start by defining where these files are downloaded, according to the HF-docs cached files can be found in ~\.cache\huggingface\hub. for multiple OS compatibility you can access this using

import os
os.path.join(os.path.expanduser("~"), ".cache", "huggingface","hub")
not-lain commented 1 week ago

Can I take this issue ?

bhavnicksm commented 1 week ago

Hey @not-lain !

Yes, it would be great if you do, thanks~!

not-lain commented 1 day ago

cc @bhavnicksm let's close this one as it was fixed in 13

bhavnicksm commented 1 day ago

Sure, sounds good!