-
This issue tracks an internal discussion with QA. This simple snippet shows why using `cuda.core` today on Windows might fail, depending on if it's TCC or WDDM mode:
```python
>>> from cuda import cud…
-
### System Info
- GPU: H100
- Triton Server with Tensor rt Backend (v.0.10.0)
- Launched on K8s. Docker Container built using [tensor rt builder](https://github.com/triton-inference-server/tensorrt…
-
Hey,
I tried to do ColBERT model inferencing via Triton server in multiple GPUs instance.
GPU 0 works fine. However, other GPU devices (1,2,3,... etc) crash when running to this line
```D_pac…
-
**Description**
The `nv_inference_pending_request_count` metric exported by tritonserver is incorrect in ensemble_stream mode.
The ensemble_stream pipeline contains 3 steps: preprocess, fastertra…
-
**Is your feature request related to a problem? Please describe.**
I find the current docker image `xx.yy-py3` doesn't have commonly use data preprocessing libraries like huggingface transformers f…
HQ01 updated
6 months ago
-
The Triton Inference Server supports TensorRT models and our the Triton Serving Runtime [indicates this](https://github.com/kserve/modelmesh-serving/blob/main/config/runtimes/triton-2.x.yaml#L28).
…
-
there is an example https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/qwenvl , but I have no idea how can I use this model in triton server, Can you provide an example of a visual language mod…
-
**Description**
The Python backend does not properly load the `model.py` file in the model directory when trailing slashes (`/`) are present in the `--backend-directory` option.
**Triton Informa…
-
Hello everyone,
I encountered an error message (as shown below) while trying to run the Mamba model (code below).
Experimental environment:
Cuda11.8 + Pytorch2.0.0 + Triton=2.2.0
What should…
-
**Description**
An application that makes use of its own version of gRPC, protobuf, etc., can experience symbol conflicts (leading to tricky to diagnose crashes) w.r.t. the versions of those symbols …