aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
122 stars 33 forks source link

list of parameters available for generate method in HuggingFaceGenerationModelAdapter class #56

Closed sayli-ds closed 10 months ago

sayli-ds commented 10 months ago

Where is the list of parameters available for model.generate (huggingface generate support), the last step? I want the output devoid of any text from the prompt.

model_cpu = LlamaForCausalLM.from_pretrained('models--meta-llama--Llama-2-13b-hf/') model_neuron = neuron_model

Use the HuggingFaceGenerationModelAdapter to access the generate API

model = HuggingFaceGenerationModelAdapter(model_cpu.config, model_neuron)

Get a tokenizer and example input

tokenizer = AutoTokenizer.from_pretrained('models--meta-llama--Llama-2-13b-hf/')

tokenizer.pad_token_id = tokenizer.eos_token_id tokenizer.padding_side = 'left' text = "Hello, I'm a language model," encoded_input = tokenizer(text, return_tensors='pt', padding=True)

Run inference using temperature

model.reset_generation()

sample_output = model.generate( input_ids=encoded_input.input_ids, attention_mask=encoded_input.attention_mask, do_sample=True, max_length=256, temperature=0.7, )

aws-rhsoln commented 10 months ago

Hi sayli-ds: The HuggingFaceGenerationModelAdapter supports all of the Hugging Face GenerationMixin.generate functionality. This means that you can refer to the GenerationMixin.generate parameters to see the list of parameters that HuggingFaceGenerationModelAdapter supports. It looks like GenerationMixin.generate does not support the ability to output just the generated text https://github.com/huggingface/transformers/issues/17117. However, you can do something like the following to output just the generated text:

generated_sequences = [tokenizer.decode(seq[encoded_input.input_ids.shape[1]:]) for seq in sample_output]

We will close this ticket, but please feel free to open another one if you experience other issues.