aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
122 stars 33 forks source link

Inference on llama 13B #55

Closed sayli-ds closed 10 months ago

sayli-ds commented 10 months ago

prompt = '''Translate English to French:

starfish => étoile de mer campfire => feu de camp snowflake => flocon de neige dragonfly => libellule maple tree => érable thunderstorm => orage seashell => coquillage waterfall => cascade hummingbird => colibri pine cone => pomme de pin lighthouse => phare dandelion => pissenlit cheese => ''' input_ids = tokenizer.encode(prompt, return_tensors="pt")

run inference

with torch.inference_mode(): start = time.time() generated_sequences = neuron_model.sample(input_ids, temperature= 0.1, sequence_length=200, top_p=0.9) elapsed = time.time() - start

generated_sequences = [tokenizer.decode(seq) for seq in generated_sequences] print(f'generated sequences {generated_sequences} in {elapsed} seconds')

I am expecting the translation of cheese (fromage) as the output. But instead getting the entire prompt as output.

What is the parallel parameter in neuron for return_full_text=False etc? This prompt works well in llama playground but not on neuron. I don't want to generate paragraphs in the output, instead looking to use this for text extraction task.

aws-rhsoln commented 10 months ago

Hi sayli-ds: We do not support this type of behavior by default. If you would like to output just the generated text, you can do something like the following:

generated_sequences = [tokenizer.decode(seq[input_ids.shape[1]:]) for seq in generated_sequences]

Please let us know if this addresses the behavior you are looking for.

jyang-aws commented 10 months ago

please feel free to re-open the issue if you need more help.