I hope this is the right place to ask this question. Let me know if I need to move to another repo.
Currently I'm using NeuronModelForCausalLM which uses LlamaForSampling under the hood.
I have a use case where I need to be able to do the following:
Generate embedding tokens
Modify embedding tokens
Run inference from modified embedding tokens
I am able to do steps 1 & 2 currently using the following:
from optimum.neuron import NeuronModelForCausalLM
llama_model = NeuronModelForCausalLM.from_pretrained('aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-1')
embedded_tokens = llama_model.model.chkpt_model.model.embed_tokens(token_ids)
### Code to modify embedded_tokens
However, as far as I can tell, generation with these modified tokens is not possible with llama_model.generate()
When I use the 'input_embeds' keyword argument, and set input_ids=None, I get the following:
ValueError: The following `model_kwargs` are not used by the model: ['inputs_embeds']
If this is not possible with the NeuronModelForCausalLM.generate() currently, is there a way to work around this manually? If so, could you provide an example?
I hope this is the right place to ask this question. Let me know if I need to move to another repo.
Currently I'm using
NeuronModelForCausalLM
which usesLlamaForSampling
under the hood.I have a use case where I need to be able to do the following:
I am able to do steps 1 & 2 currently using the following:
However, as far as I can tell, generation with these modified tokens is not possible with
llama_model.generate()
When I use the 'input_embeds' keyword argument, and set
input_ids=None
, I get the following:If this is not possible with the NeuronModelForCausalLM.generate() currently, is there a way to work around this manually? If so, could you provide an example?
Thanks very much for your help!