Closed cliangyu closed 9 months ago
I realized PY007/TinyLlama-1.1B-Chat-v0.1
also generates repetitive words. So I'm not certain which approx model we should use for llama.
Hi Jiarui! Great implementation! I found PY007/TinyLlama-1.1B-intermediate-step-240k-503b generates repetitive words. Maybe that's why speculative sampling doesn't work for me. My script:
python main.py \ --input "Write a 1000-word essay on the US constitutions" \ --target_model_name transformers_cache/llama-2-7b-hf \ --approx_model_name PY007/TinyLlama-1.1B-intermediate-step-240k-503b
https://huggingface.co/PY007/TinyLlama-1.1B-Chat-v0.1
is preferred by the author. However, the tokenizer size of this model is 32001. While the tokenizer size ofllama-2-7b-hf
andPY007/TinyLlama-1.1B-intermediate-step-240k-503b
are 32000. Probably fixing the pad token will help?
Hello Cliang, TinyLlama seems to use llama-1 architecture. I can hardly find a tiny llama-2. I am not sure if two models use different vocab tables could work together. I suggest you pick two models using the same tokenizer.
I realized
PY007/TinyLlama-1.1B-Chat-v0.1
also generates repetitive words. So I'm not certain which approx model we should use for llama.
First, no matter what kind of approx model used, you can always achieve the same generation distribution as the target model. Second, if you pick an approx model more similar to the target one, the odds of reject sampling will be small. That means more efficient for sampling.
Hi Jiarui! Great implementation! I found PY007/TinyLlama-1.1B-intermediate-step-240k-503b generates repetitive words. Maybe that's why speculative sampling doesn't work for me. My script:
python main.py \ --input "Write a 1000-word essay on the US constitutions" \ --target_model_name transformers_cache/llama-2-7b-hf \ --approx_model_name PY007/TinyLlama-1.1B-intermediate-step-240k-503b
https://huggingface.co/PY007/TinyLlama-1.1B-Chat-v0.1
is preferred by the author. However, the tokenizer size of this model is 32001. While the tokenizer size ofllama-2-7b-hf
andPY007/TinyLlama-1.1B-intermediate-step-240k-503b
are 32000. Probably fixing the pad token will help?Hello Cliang, TinyLlama seems to use llama-1 architecture. I can hardly find a tiny llama-2. I am not sure if two models use different vocab tables could work together. I suggest you pick two models using the same tokenizer.
TinyLlama-1.1B adopted exactly the same architecture and tokenizer as Llama 2. ref:https://github.com/jzhang38/TinyLlama/tree/main?tab=readme-ov-file#tinyllama-11b
Hi Jiarui! Great implementation! I found PY007/TinyLlama-1.1B-intermediate-step-240k-503b generates repetitive words. Maybe that's why speculative sampling doesn't work for me. My script:
https://huggingface.co/PY007/TinyLlama-1.1B-Chat-v0.1
is preferred by the author. However, the tokenizer size of this model is 32001. While the tokenizer size ofllama-2-7b-hf
andPY007/TinyLlama-1.1B-intermediate-step-240k-503b
are 32000. Probably fixing the pad token will help?