aws-neuron / transformers-neuronx

Apache License 2.0
88 stars 25 forks source link

How to use generate() with inputs_embeds #70

Closed liechtym closed 6 months ago

liechtym commented 6 months ago

I hope this is the right place to ask this question. Let me know if I need to move to another repo.

Currently I'm using NeuronModelForCausalLM which uses LlamaForSampling under the hood.

I have a use case where I need to be able to do the following:

  1. Generate embedding tokens
  2. Modify embedding tokens
  3. Run inference from modified embedding tokens

I am able to do steps 1 & 2 currently using the following:

from optimum.neuron import NeuronModelForCausalLM

llama_model = NeuronModelForCausalLM.from_pretrained('aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-1')

embedded_tokens = llama_model.model.chkpt_model.model.embed_tokens(token_ids)

### Code to modify embedded_tokens

However, as far as I can tell, generation with these modified tokens is not possible with llama_model.generate()

When I use the 'input_embeds' keyword argument, and set input_ids=None, I get the following:

ValueError: The following `model_kwargs` are not used by the model: ['inputs_embeds']

If this is not possible with the NeuronModelForCausalLM.generate() currently, is there a way to work around this manually? If so, could you provide an example?

Thanks very much for your help!

aws-taylor commented 6 months ago

Hello @liechtym,

I think this may be more appropriate for https://github.com/huggingface/optimum-neuron.

-T

liechtym commented 6 months ago

Thanks. Moved to https://github.com/huggingface/optimum-neuron/issues/395