-
Hi, Dear NJU-Jet
my linux server: several 2.6GHz CPU + several V100, and I run the **generate_tflite.py** to got a quantized model.
and then in function **evaluate**, I add below code to measu…
-
### Description
```shell
Docker: nvcr.io/nvidia/tritonserver:23.04-py3
Gpu: A100
How can i stop bi-direction streaming(decoupled mode)?
- I want to stop model inference(streaming response) when …
-
### System Info
```shell
AWS EC2 instance: trn1.32xlarge
OS: Ubuntu 22.04.4 LTS
Platform:
- Platform: Linux-6.5.0-1023-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
Python packages:
…
cszhz updated
1 month ago
-
Thank you for excllent work.
> Detection models now can be exported to TRT engine with batch size > 1 - **inference code doesn't support it yet**, though now they could be used in Triton Inference Se…
-
Updating my Yomininja results in the program not being able to start, I saw a similar issue but I can't really read this so I have no idea if it's related to that OCR engine, I use lens.
```
PS C:…
-
(mimctalk) tom@tom-System:~/MimicTalk$ python inference/train_mimictalk_on_a_video.py
cp checkpoints/mimictalk_orig/os_secc2plane_torso/config.yaml checkpoints_mimictalk/GER
/home/tom/miniconda3/env…
-
http://www.nowcode.cn/nav.05.%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD/12.Triton-Inference.html
-
I have downloaded LLAMA 3.2 1B Model from Hugging face with optimum-cli
optimum-cli export openvino --model meta-llama/Llama-3.2-1B-Instruct llama3.2-1b/1
Below are files downloaded
!…
-
Hey, can you kindly tell if we can integrate a trt llm built engine (whisper to be pricese) in deepstream pipeline ? As per my knowledge we can either use a trt engine directly (not sure about trt-llm…
-
I want to run h2ogpt just with inference api, without specifying basemodel name.
For example, I have my llama model deployed in external server and exposes api to inference it. Hence I want to cons…