This repository is the official implementation of our paper MVP: Multi-task Supervised Pre-training for Natural Language Generation.
Apache License 2.0
68 stars 3 forks source link

Does the model support FP16 inference? #9

Closed ereday closed 1 year ago

ereday commented 1 year ago


I was looking for ways to increase the inference speed, and one thing I thought would be useful was to use FP16. For this, I called model.half() after loading it. Unfortunately, it generated RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' error. I was wondering if there is a way to use FP16 during inference? (Or any other trick to accelerate inference).

# This works:
model = MvpForConditionalGeneration.from_pretrained('RUCAIBox/mvp')
inputs = tokenizer(
    ["Describe the following data: Iron Man | instance of | Superhero [SEP] Stan Lee | creator | Iron Man",
     "Describe the following data: Batman | instance of | Superhero",

generated_ids = model.generate(**inputs)

tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
['Iron Man is a fictional superhero appearing in American comic books published by Marvel Comics.',
"Batman is a superhero"]

# This doesn't:
model = model.half()
generated_ids = model.generate(**inputs)
StevenTang1998 commented 1 year ago

Sorry, I am not familar with this. Our model is based on the Hugging Face API. You can find solution in their GitHub or Forum. Or the Accelerate can work?