-
### System Info
Docker image: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3
Device: 8x H100
trt-llm backend: v0.11.0
### Who can help?
@byshiue @schetlur-nv
### Information
- [ ] The off…
-
### 🚀 The feature, motivation and pitch
Hi all, I was wondering if it's possible to do precise model device placement. For example, I would like to place the vLLM model on GPU 1 and let GPU 0 do othe…
-
According the `scripts/profile/inference.jl`, I got the png below, which says on my 2080ti / 28 cores cpu machine, batch size 512 is the best, and it cost about '1.6e-5' per example.
![inference-gp…
-
### Describe the bug
Using a Celeron J4125 I am trying to run OpenVINO, but get
```
[Step 7/11] Loading the model to the device
[ ERROR ] Check 'false' failed at src/inference/src/core.cpp:114…
-
I was confirmed that the same problem occurred when a model using batch normalization was used.
Thanks for solving the problem by using group normalization!
1. I need torch model files to use batc…
-
Given a prompt, the resulting embedding will be slightly different if it was computed in a batch (`batch_size > 1`) vs. if it was computed as a single inference.
For example, computing the embeddin…
-
**Description**
Triton Sever crashed after some period of time running inferences using Python Backend models. The Python backend models are running TensorRT models with [mmdeploy python api](https:/…
-
## 🐛 Bug
It seems like SSIM can have values larger than 1 when computing over an epoch. I cannot reproduce this error but only observe it with tensorboard after training.
![image](https://github…
-
**Is your feature request related to a problem? Please describe.**
As documented [here](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.htm…
-
Error 1:
```
ERROR: PosDefException: matrix is not Hermitian; Cholesky factorization failed.
Stacktrace:
[1] non_hermitian_error()
@ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/sr…