Closed muhammadfhadli1453 closed 9 months ago
just use the original one. if the tokenizer.model
is in a different directory, you can use the --vocab-dir
argument
just use the original one. if the
tokenizer.model
is in a different directory, you can use the--vocab-dir
argument
what do you mean the original one? can you explain please?
He means from the the base model you fine tuned.
He means from the the base model you fine tuned.
i see.. but i finetuned the model into different language, will it still works?
i finetuned the model into different language, will it still works?
I think it would depend on whether you made changes to the vocabulary in addition to training (like adding tokens, etc). If it was just training, then I believe it would work. I'm not 100% sure about this though.
same questions here.
same questions here.
And the answer was given. Tokenizer model is included in the resulting file, so you need one that matches the model you are trying to convert.
Many (most) of the base models I've seen on Hugging Face do not have a file named tokenizer.model. So I am also having the same issue.
Same issue
same issue. base model also doesn't have a tokenizer.model, is there a way to get the tokenizer from the huggingface auto tokenizer?
same issue
I have found a solution for this problem. The default vocabtype is 'spm' which invokes a Sentence Piece tokenizer. Some models utilize a Byte-Pair encoding (bpe) tokenizer. To convert a BPE-based model, use this syntax:
convert.py modelname_or_path --vocabtype bpe
I have found a solution for this problem. The default vocabtype is 'spm' which invokes a Sentence Piece tokenizer. Some models utilize a Byte-Pair encoding (bpe) tokenizer. To convert a BPE-based model, use this syntax:
convert.py modelname_or_path --vocabtype bpe
--vocab-type
He means from the the base model you fine tuned.
Downloaded llama (all models) model from meta does not have tokenizer. I have same issue.
He means from the the base model you fine tuned.
Downloaded llama (all models) model from meta does not have tokenizer. I have same issue.
go to huggingface and search the model, download the tokenizer separated and move to the folder without the tokenizer
I am here with the same problem trying to convert llama 3 70B. I don't know what is meant by "go to huggingface and search the model, download the tokenizer separated" ... there is no tokenizer.model on the llama3 70B page, and searching for it is turning up nothing. Where can I download the tokenizer for this?
I am here with the same problem trying to convert llama 3 70B. I don't know what is meant by "go to huggingface and search the model, download the tokenizer separated" ... there is no tokenizer.model on the llama3 70B page, and searching for it is turning up nothing. Where can I download the tokenizer for this?
here: https://huggingface.co/meta-llama/Meta-Llama-3-8B/tree/main/original put the tokenizer.model in your model folder and then use --vocab-type bpe as stated above, it worked for me
When I ran this command:
I encountered the following error:
After training the llama2 model, I do not have a "tokenizer.model" file. Instead, the model directory contains the following files:
What should I do to resolve this issue?
*note: i follow this tutorial for finetuning https://blog.ovhcloud.com/fine-tuning-llama-2-models-using-a-single-gpu-qlora-and-ai-notebooks/