Open Godlovecui opened 3 weeks ago
We are working on a tutorial for inference with Gemma: https://github.com/NVIDIA/TransformerEngine/blob/5cb8ed4d129245357363361947e5b1d31c543783/docs/examples/te_gemma/tutorial_generation_gemma_with_te.ipynb. We're still tweaking it, so we'd appreciate any feedback at https://github.com/NVIDIA/TransformerEngine/pull/829.
Hi, Can TransformerEngine be compiled into a pip package,if I want to use transformerEngine in vLLM. Thank you~ @timmoon10
ENV: RTX 8*4090
I want to test FP8 of TransformerEngine in llama3 (from huggingface) for inference. I can not find instructions on inference. Can you share some code? Thank you~