-
Can I specify a specific version to load or upload when using triton-inference-server for model management?
I only found the following two APIs:
Load model: v2/repository/models/{model-name}/load
…
-
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 111
CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_c…
-
### The bug
Hello, I get this error when copying the image after using the next and previous button
https://github.com/user-attachments/assets/bc878a6f-94d8-47d4-bc92-b77aafa59dcc
### The OS …
-
## Bug Report
Does Tensorflow Serving support XLA compiled SavedModels ? or am I doing something wrong ?
### System information
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: [D…
-
Hi experts,
I'm running a 1.3B model on windows with 16GB V100 with below envs, but hit an issue which I couldn't find any clue. Could you please help check it.
TensorRT-LLM version: tag v0.10.0…
-
I use the docker image chat-ui-db as the frontend, text-generation-inference as the inference backend, and meta-llamaLlama-2-70b-chat-hf as the model.
In the model field of the .env.local file, I hav…
-
There is a BERT based model used in Ant group for inference on geo similarity compare.
https://modelscope.cn/models/damo/mgeo_geographic_entity_alignment_chinese_base/summary
https://modelscope.cn/m…
-
## Description
There exists a strange combination of factors when inference can cause the reasoner to act just like it ignores a `not` subquery, leading to incorrect results.
I have a query that c…
-
**Description**
I have a 5 steps ensemble pipeline for triton.
* 3 steps are torchscript artifacts
* 2 steps are tensorrt compiled models
in pbtxts files I have
```
instance_group [{ kind: KIN…
-
if i test it with 2 concurecy , it will run into error. error detail is :
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1473,…