-
@npuichigo I am trying to use [Triton Inference Server with TensorRT-LLM backend](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html#deploy-with-triton-inference-server) with [openweb-ui](ht…
-
I completed the concurrency test based on **tensor RT + triton server** deployment, and the concurrency was about doubled compared to faster-whisper.
I am testing its accuracy, but the Chinese tr…
-
This is a PR on the upstream `openai/triton`: https://github.com/openai/triton/pull/2629
which use non-anonymous email sending function to avoid being intercepted, like mail connection string: `smtp+…
-
**Is your feature request related to a problem? Please describe.**
vLLM backend works well and is easy to set up, compared to TensorRT which had me pulling my hair.
However it lacks the OpenAI co…
-
I need to use locally deployed LLMs for evaluation within my current setup. While setting up LLM monitoring using Phoenix, I require evaluations with the traces, I am only able to find [evaluation llm…
-
I have converted Mixtral to TensoRT and I am trying to use your repository to integrate with OpenAI.
I'm using the template history_template_llama3.liquid. When I run your example code for interactin…
-
I don't know what's going on, reporting this kind of error. Everything is normal before the training, this problem suddenly occurred, can you help me look at it?
2024-04-20 08:27:16.276530: Epoch 600…
-
# Enhancement
Use the [Triton](https://triton-lang.org/main/index.html) compiler from OpenAI to accelerate model training.
-
Hello, I want to deploy llama-3-8b quantized model using tritonserver I followed below steps to do this:
1. create container with nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 base image.
3.…
-
### The high level motivation
Some real world PyTorch benchmarks that we would like to run are at: https://github.com/pytorch/benchmark/tree/64409d5704b6136c6cb28071ff8eba61751b1b02/torchbenchmark/…