aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications

Other

122 stars 33 forks source link

Where is the list of parameters available for model.generate (huggingface generate support), the last step? I want the output devoid of any text from the prompt.

model_cpu = LlamaForCausalLM.from_pretrained('models--meta-llama--Llama-2-13b-hf/') model_neuron = neuron_model

Use the `HuggingFaceGenerationModelAdapter` to access the generate API

model = HuggingFaceGenerationModelAdapter(model_cpu.config, model_neuron)

Get a tokenizer and example input

tokenizer = AutoTokenizer.from_pretrained('models--meta-llama--Llama-2-13b-hf/')

tokenizer.pad_token_id = tokenizer.eos_token_id tokenizer.padding_side = 'left' text = "Hello, I'm a language model," encoded_input = tokenizer(text, return_tensors='pt', padding=True)

Run inference using temperature

model.reset_generation()

sample_output = model.generate( input_ids=encoded_input.input_ids, attention_mask=encoded_input.attention_mask, do_sample=True, max_length=256, temperature=0.7, )

Hi sayli-ds: The HuggingFaceGenerationModelAdapter supports all of the Hugging Face GenerationMixin.generate functionality. This means that you can refer to the GenerationMixin.generate parameters to see the list of parameters that HuggingFaceGenerationModelAdapter supports. It looks like GenerationMixin.generate does not support the ability to output just the generated text https://github.com/huggingface/transformers/issues/17117. However, you can do something like the following to output just the generated text:

generated_sequences = [tokenizer.decode(seq[encoded_input.input_ids.shape[1]:]) for seq in sample_output]

We will close this ticket, but please feel free to open another one if you experience other issues.

aws-neuron / aws-neuron-samples

list of parameters available for generate method in HuggingFaceGenerationModelAdapter class #56

Use the HuggingFaceGenerationModelAdapter to access the generate API

Get a tokenizer and example input

Run inference using temperature

Use the `HuggingFaceGenerationModelAdapter` to access the generate API