ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.65k stars 150 forks source link

Using t5-large in t5 notebook, the translation result is invalid #135

Open brevity2021 opened 2 years ago

brevity2021 commented 2 years ago

Hi,

First, thank you for the great work! I was playing with the t5 notebook in demo/generative-model. I build a docker image through Makefile, and run the notebook from the container.

I changed little things from the notebook, only a few printings. When I ran using t5-small it runs fine. But when I switched to use t5-large, the translation result in the Benchmark section becomes empty. I further printed out the tokens generated, and the results are

text generated by ONNX:
Onnx tokens:
tensor([0, 2, 0, 1], device='cuda:0')

which is obviously not correct.

I attach the notebook here for your reference. I doubt if there are any instability issues when you convert to fp16, since that method depends on the randomly generated data.

My experiment was running on a g5.2xlarge instance.

ayoub-louati commented 2 years ago

@brevity2021 we are working on adding support for t5 conversion using the convert script. I think it should cover the precision for different T5 models (including t5-large).