Closed joostinyi closed 4 months ago
Could we point the tokenizer at this repo for now to avoid the hf token dependency? https://huggingface.co/baseten/Meta-Llama-3-tokenizer
@vshulman Is this the meta-llama/Meta-Llama-3-XB-Instruct tokenizer or the base model?
Could we point the tokenizer at this repo for now to avoid the hf token dependency? https://huggingface.co/baseten/Meta-Llama-3-tokenizer
@vshulman Is this the meta-llama/Meta-Llama-3-XB-Instruct tokenizer or the base model?
Good call out. It's the instruct tokenizer. To reduce confusion -- here's a new target: baseten/Meta-Llama-3-Instruct-tokenizer
and I will update the other hf model to be the base model tokenizer
update tokenizer repository reference to pick up upstream changes to tokenizer https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/commit/4d6c61da057c45bfc4dc4d3bfa5a691ecb9ce0cf. This is needed to remedy the issue with end of sequence token acknowledgement by Llama3 Instruct models.
This change requires that users have
hf_access_token
and have been granted access to the granted Llama3 repo. Tagging @squidarth for model library implications