Open taewan2002 opened 4 months ago
I am trying to run the benchmarking on an Nvidia Orin 64GB machine due to lack of GPU resources, but it is too slow, so I would appreciate it if you could apply TensorRT-LLM. 🤣
Hello!
We don't currently support TRT-LLM, though we do support VLLM which should improve over HF performance.
We'd however welcome a contribution adding TRT-LLM!
I am trying to run the benchmarking on an Nvidia Orin 64GB machine due to lack of GPU resources, but it is too slow, so I would appreciate it if you could apply TensorRT-LLM. 🤣