ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

Llama support #170

Open michaelroyzen opened 1 year ago

michaelroyzen commented 1 year ago

Would it be possible to run llama using this? Is the gpt2 example hackable to run llama on tensorrt?

ktl014 commented 1 year ago

My question also^ - very much interested in trying to get LLMs like Llama to work on Triton CC @pommedeterresautee

tikikun commented 1 year ago

Also very interested