Open liechtym opened 9 months ago
In transformers repo they said the HuggingFaceGenerationModelAdapter
incompatibility error is probably stemming from the tranfomers-neuronx wrapper. Any help with this?
Here is the error:
Traceback (most recent call last):
File "modular.py", line 107, in <module>
chatbot = MiniGPT4LLama2Chatbot(cfg_path, gpu_id)
File "modular.py", line 62, in __init__
self.model = model_cls.from_config(model_config)
File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 173, in from_config
model = cls(
File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 45, in __init__
super().__init__(
File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt_base.py", line 43, in __init__
self.llama_model, self.llama_tokenizer = self.init_llm(
File "/home/ubuntu/MiniGPT-4/minigpt4/models/base_model.py", line 202, in init_llm
llama_model = HuggingFaceGenerationModelAdapter(llama_model_cpu.config, llama_model_neuron)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/generation_utils.py", line 18, in __init__
super().__init__(config)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1190, in __init__
config = self._autoset_attn_implementation(
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1311, in _autoset_attn_implementation
config = cls._check_and_enable_sdpa(
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1464, in _check_and_enable_sdpa
raise ValueError(
ValueError: HuggingFaceGenerationModelAdapter does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
See more details on the issue page: https://github.com/huggingface/transformers/issues/28396.
Of course my general goal is to simply get this working with input embeddings so if this is not the right route, let me know.
Hi @liechtym , We do not have support for external embeddings. One way you could potentially get around this is by replacing the model embedding weights directly. Please let us know if that helps.
@shebbur-aws Thanks for your reply. A workaround is totally fine for me. Would you be able to give a quick explanation/example for how to replace the embedding weights and run the forward pass on the rest of the model?
Could I get help on this @shebbur-aws ?
@liechtym @shebbur-aws Hi~ I've got the same situation here, do you have any resolution or workaround on this? Input embeds as model input parameter instead of input ids. Thanks~
Compiling and loading Llama 2 in Neuron is working great for me on a
inf2.8xlarge
with the new release2.16
.However, I have a unique use case where I need to be able to input embeddings directly into Llama 2 instead of token ids. I need to be able to generate the embeddings, modify the embeddings, and then use the embeddings for generation. I was already able to generate the embeddings separately via
llama_model.chkpt_model.model.embed_tokens(token_ids)
. However, I'm not seeing a way to plug those embeddings into the model once I've modified them.It seems to me that
LlamaForSampling.sample()
(fromtransformers_neuronx.llama.model
) probably can't do this (correct me if I'm wrong). I gotTypeError: sample() got an unexpected keyword argument 'inputs_embeds'
when I tried.So, I tried using the
HuggingFaceGenerationModelAdapter
fromtransformers_neuronx.generation_utils
to enable using the generation API as was performed on this GP2 example. However, there was an error that prevented that, which I filed an issue for in the tranfomers repo.What is the best way to go about doing this? I really appreciate your help.