support Llama 3.1 8B Instruct with SGLang and TensorRT LLM

zhyncs commented 3 weeks ago

Experimental support with SGLang for future benchmarking, such as comparison with vLLM and TensorRT LLM.

Hi @pankajroark @philipkiely-baseten @squidarth @aspctu May you help review this PR? And do you have any suggestions? Thanks.

Llama 3.1 70B and FP8 versions will be completed in separate PRs. The benchmark script will also be completed in a separate PR, similar to https://github.com/sgl-project/sglang/blob/main/python/sglang/bench_serving.py

zhyncs commented 3 weeks ago

I referenced the implementation at https://github.com/basetenlabs/truss-examples/tree/main/llama/llama-3-8b-instruct, replaced the API of vLLM with SGLang's, and made some modifications.

zhyncs commented 3 weeks ago

Thanks @pankajroark for recommending me to use https://github.com/basetenlabs/Workshop-TRT-LLM/tree/main/03_benchmark yesterday, I tried it on SGLang and vLLM, the basic benchmark can be run. And I plan to make some changes to meet my benchmark needs.

zhyncs commented 3 weeks ago

Hi @pankajroark @philipkiely-baseten I think it's ready for merge. May you help merge this PR? Thanks!

zhyncs commented 3 weeks ago

cc @joostinyi

basetenlabs / truss-examples