MDK8888 / GPTFast

Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.
Apache License 2.0
677 stars 64 forks source link

Help to run GPTFast on Mixtral-8x7B-Instruct-v0.1 #25

Open davideuler opened 4 months ago

davideuler commented 4 months ago

Could you help to give an example code to run GPTFast on Mixtral-8x7B-Instruct-v0.1?

I load the model with GPTFast with empty draft_model_name. Error shows when loading the model as following.

model_name = "./Mixtral-8x7B-v0.1"
draft_model_name = ""

tokenizer = AutoTokenizer.from_pretrained(model_name)
initial_string = "Write me a short story."
input_tokens = tokenizer.encode(initial_string, return_tensors="pt").to(device)

# ....

Traceback (most recent call last): File "/data/gptfast.py", line 77, in gpt_fast_model = gpt_fast(model_name, sample_function=argmax, max_length=60, cache_config=cache_config, draft_model_name=draft_model_name) File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/GPTFast.py", line 11, in gpt_fast model = add_kv_cache(model, sample_function, max_length, cache_config, dtype=torch.float16) File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/KVCache/KVCacheModel.py", line 208, in add_kv_cache model = KVCacheModel(transformer, sampling_fn, max_length, cache_config, dtype) File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/KVCache/KVCacheModel.py", line 21, in init self._model = self.add_static_cache_to_model(model, cache_config, max_length, dtype, self.device) File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Core/KVCache/KVCacheModel.py", line 48, in add_static_cache_to_model module_forward_str_kv_cache = add_input_pos_to_func_str(module_forward_str, forward_prop_ref, "input_pos=input_pos") File "/root/anaconda3/envs/llm/lib/python3.10/site-packages/GPTFast/Helpers/String/add_input_pos_to_func_str.py", line 18, in add_input_pos_to_func_str raise ValueError("Submodule forward pass not found.") ValueError: Submodule forward pass not found.

MDK8888 commented 4 months ago

Hey David, apologies for the late response. Mixtral should support static caching natively, and a new branch should be up this weekend or early next week with the fixes.

davideuler commented 4 months ago

Hey David, apologies for the late response. Mixtral should support static caching natively, and a new branch should be up this weekend or early next week with the fixes.

Thanks, looking forward the new branch.