-
when using perf_analyzer to analyze a python decoupled model like [triton-decoupled](https://github.com/Jackiexiao/triton-decoupled-cache) using command below
```
perf_analyzer -i grpc --streami…
-
hi everyone
i runing tritonserver vllm and i want runing with dynamic batching, but i encountered an error. It seems like it has something to do with my input
Inference with curl:
curl -X POST loca…
-
**Is your feature request related to a problem? Please describe.**
When writing the `model.py` file for a Python backend model, it is very difficult to correctly use `triton_python_backend_utils` (ak…
-
**Is your feature request related to a problem? Please describe.**
I aim to deploy my ASR model on a server that will receive audio packet bytes with each request. The server will then transcribe the…
-
Hello, I wanted to ask whether it is possible to create in place operations. I have a pretty big DALI pipeline (in terms of image size) and I have to preprocess data, but each operation creates a copy…
-
### Your current environment
```text
The output of `python collect_env.py`
```
```
:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', bu…
-
I just followed the steps of commands show in Readme:
1. apt install git-lfs
git lfs install --skip-repo
git clone https://github.com/NVIDIA-AI-IOT/deepstream_parallel_inference_app.git
2. apt-get…
-
[RFD27/Container Monitor](https://github.com/joyent/rfd/blob/master/rfd/0027/README.md) integration requires two things:
1. TLS certs based on a user's SSH key
2. Discovery of RFD27 endpoints
### Auth…
-
Can i use lightseq to speed up the model of fairseq Transformer Decoder ?
I already export the Transformer Decoder language model trained by fairseq , now i want to speed up the model by light seq …
-
### System Info
CPU Architecture: x86_64
CPU/Host memory size: 1024Gi (1.0Ti)
GPU properties:
GPU name: NVIDIA GeForce RTX 4090
GPU mem size: 24Gb…