Closed sayli-ds closed 10 months ago
Hi sayli-ds: The HuggingFaceGenerationModelAdapter supports all of the Hugging Face GenerationMixin.generate functionality. This means that you can refer to the GenerationMixin.generate parameters to see the list of parameters that HuggingFaceGenerationModelAdapter supports. It looks like GenerationMixin.generate does not support the ability to output just the generated text https://github.com/huggingface/transformers/issues/17117. However, you can do something like the following to output just the generated text:
generated_sequences = [tokenizer.decode(seq[encoded_input.input_ids.shape[1]:]) for seq in sample_output]
We will close this ticket, but please feel free to open another one if you experience other issues.
Where is the list of parameters available for model.generate (huggingface generate support), the last step? I want the output devoid of any text from the prompt.
model_cpu = LlamaForCausalLM.from_pretrained('models--meta-llama--Llama-2-13b-hf/') model_neuron = neuron_model
Use the
HuggingFaceGenerationModelAdapter
to access the generate APImodel = HuggingFaceGenerationModelAdapter(model_cpu.config, model_neuron)
Get a tokenizer and example input
tokenizer = AutoTokenizer.from_pretrained('models--meta-llama--Llama-2-13b-hf/')
tokenizer.pad_token_id = tokenizer.eos_token_id tokenizer.padding_side = 'left' text = "Hello, I'm a language model," encoded_input = tokenizer(text, return_tensors='pt', padding=True)
Run inference using temperature
model.reset_generation()
sample_output = model.generate( input_ids=encoded_input.input_ids, attention_mask=encoded_input.attention_mask, do_sample=True, max_length=256, temperature=0.7, )