-
### System Info
I am working on the benchmarking suite in vLLM team, and now trying to run TensorRT-LLM for comparison. I am relying on this github repo (https://github.com/neuralmagic/tensorrt-demo)…
-
Is there any client API for LTU-AS (13B) ?
I cannot find the 13B checkpoints in the GitHub repo. And the API only support "7B (Default)" and does not support "13B (Beta)"
-
### System Info
Hi,
I noticed there is no slack, discord or irc channel for tensorrt - which could offload some future tickets by discussing things in the channel - so I created one.
I hope its…
-
### System Info
pandasai==2.2.14
Python 3.10.12
### 🐛 Describe the bug
```
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig
model_id = "hugging-quants/Meta…
-
Im using nvcr.io/nvidia/tritonserver:23.10-py3 container for my inferencing, using C++ GRPC API. There is several models in container, Yolov8-like architecture in Tensorrt plus a few Torchscript model…
-
We are using Triton Inference Server for model inference and currently facing throughput bottlenecks with LLM inference. I saw in a public video that Nvidia has optimized LLM serving by supporting `In…
-
**ON server:**
`
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:3013
INFO:w…
-
**Kibana version:** 8.14.0-SNAPSHOT
**Elasticsearch version:** 8.14.0-SNAPSHOT
**Server OS version:** OSX 14.3
**Original install method (e.g. download page, yum, from source, etc.):** sour…
-
### 🐛 Describe the bug
TorchServe version is 0.10.0.
It's my code:
```
def get_inference_stub(address: str, port: Union[str, int]= 7070):
channel = grpc.insecure_channel(address + ':' + str(p…
-
Server -> Receiving message of size: 24883378
Server -> 24883378 bytes read
Server -> Message parsed
Server -> Received inference request
Server -> Requesting inference on model: densepose
Server…