ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

Switch T5 notebook to 3B model #95

Closed pommedeterresautee closed 2 years ago

pommedeterresautee commented 2 years ago

Right now the T5 notebook is based on large flavor. There was 2 blocking issues with 3B flavor:

The ORT bug has been fixed by https://github.com/microsoft/onnxruntime/pull/11650 context: https://github.com/microsoft/onnxruntime/issues/11511

The only remaining blocker is the memory footprint. The cause is that we load 2 times the decoder weights during the conversion (one with cache support the other without). We can avoid that by using the If trick during the conversion like we do after it.