ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

Torch 2.0 #169

Open varshith15 opened 1 year ago

varshith15 commented 1 year ago

Is there a way to leverage torch2.0's compile using tensorrt as a backend directly? without all the current tedious process? https://pytorch.org/docs/stable/dynamo/get-started.html

And any thoughts on torch 2.0 in general? Has anyone tried it out? I've tried it out for a few of the transformer models, there doesn't seem to be any improvement. @pommedeterresautee @ayoub-louati

pommedeterresautee commented 1 year ago

Yes tensorRT is supported out of the box. However, it adds its own overhead and is not always best choice in my tests. Kernl runs on top of PyTorch 2.0. The 2.0 targets mostly for now training (and not inference).

varshith15 commented 1 year ago

@pommedeterresautee any thoughts on Apache tvm?

pommedeterresautee commented 1 year ago

TVM was best for non GPU stuff. Recently they started to support better GPU through cutlass + adding possibility to program at block of threads level (CTAs), but IMO Triton is a better choice for now when Nvidia hw is your target