Does the model support FP16 inference?

Hello,

I was looking for ways to increase the inference speed, and one thing I thought would be useful was to use FP16. For this, I called model.half() after loading it. Unfortunately, it generated RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' error. I was wondering if there is a way to use FP16 during inference? (Or any other trick to accelerate inference).

# This works:
model = MvpForConditionalGeneration.from_pretrained('RUCAIBox/mvp')
inputs = tokenizer(
    ["Describe the following data: Iron Man | instance of | Superhero [SEP] Stan Lee | creator | Iron Man",
     "Describe the following data: Batman | instance of | Superhero",
    ]
    return_tensors="pt",
)

generated_ids = model.generate(**inputs)

tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
['Iron Man is a fictional superhero appearing in American comic books published by Marvel Comics.',
"Batman is a superhero"]


# This doesn't:
model = model.half()
generated_ids = model.generate(**inputs)

RUCAIBox / MVP

Does the model support FP16 inference? #9