triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/cuda-python #206

Handle Windows TCC/WDDM mode more robustly

This issue tracks an internal discussion with QA. This simple snippet shows why using `cuda.core` today on Windows might fail, depending on if it's TCC or WDDM mode: ```python >>> from cuda import cud…

leofang updated 1 week ago
3
triton-inference-server/tensorrtllm_backend #569

LLAMA3: Unable to launch with tp 2

### System Info - GPU: H100 - Triton Server with Tensor rt Backend (v.0.10.0) - Launched on K8s. Docker Container built using [tensor rt builder](https://github.com/triton-inference-server/tensorrt…

mindhash updated 2 months ago
2
stanford-futuredata/ColBERT #348

GPU crashes when running "D_packed @ Q.to(dtype=D_packed.dty…

Hey, I tried to do ColBERT model inferencing via Triton server in multiple GPUs instance. GPU 0 works fine. However, other GPU devices (1,2,3,... etc) crash when running to this line ```D_pac…

Jimmy9507 updated 4 months ago
1
triton-inference-server/server #6494

nv_inference_pending_request_count metric exported in 23.09 …

**Description** The `nv_inference_pending_request_count` metric exported by tritonserver is incorrect in ensemble_stream mode. The ensemble_stream pipeline contains 3 steps: preprocess, fastertra…

hxer7963 updated 3 months ago
2
triton-inference-server/server #7107

Can we include commonly used data pre-processing library in …

**Is your feature request related to a problem? Please describe.** I find the current docker image `xx.yy-py3` doesn't have commonly use data preprocessing libraries like huggingface transformers f…

HQ01 updated 6 months ago
2
kserve/modelmesh-serving #64

TensorRT examples and tests

The Triton Inference Server supports TensorRT models and our the Triton Serving Runtime [indicates this](https://github.com/kserve/modelmesh-serving/blob/main/config/runtimes/triton-2.x.yaml#L28). …

pvaneck updated 3 years ago
1
triton-inference-server/tensorrtllm_backend #463

Can you provide an example of a visual language model or mul…

there is an example https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/qwenvl , but I have no idea how can I use this model in triton server, Can you provide an example of a visual language mod…

lzcchl updated 5 months ago
6
triton-inference-server/server #6730

Python backend doesn't load model.py in the model directory …

**Description** The Python backend does not properly load the `model.py` file in the model directory when trailing slashes (`/`) are present in the `--backend-directory` option. **Triton Informa…

twjang updated 10 months ago
1
triton-lang/triton #4390

RuntimeError: Triton Error [CUDA]: device kernel image is in…

Hello everyone, I encountered an error message (as shown below) while trying to run the Mamba model (code below). Experimental environment: Cuda11.8 + Pytorch2.0.0 + Triton=2.2.0 What should…

MstarLioning updated 3 months ago
1
triton-inference-server/server #6221

Triton library does not fully insulate applications from bac…

**Description** An application that makes use of its own version of gRPC, protobuf, etc., can experience symbol conflicts (leading to tricky to diagnose crashes) w.r.t. the versions of those symbols …

acmorrow updated 1 year ago
3

上一页 1...30 31 32 33 34 35 36...100 下一页

1000+ results for triton-server

1000+ results
for triton-server