-
*Is your feature request related to a problem? Please describe.*
Triton python backend should provide dynamic batching just like other backends supported by triton. For eg.
For the model config ment…
-
Hello,
I tried to use nvidia triton streaming configuration with pruned stateless 7 streaming model, but it seems that one input is missing to encoder "avg_cache", this seems to be added in new zip…
-
Epoch [1/3]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File :21, in _fwd_kernel(Q, K, V,…
-
Checklist
- [x] I've prepended issue tag with type of change: [feature]
- [x] (If applicable) I've documented below the DLC image/dockerfile this relates to
- [x] (If applicable) I've documented th…
-
### What happened?
POD with sriov nic device attached to it fails to attach correct sriov device up on node is hard rebooted after volumes are attached to it. The node is a VM in openstack cloud pr…
-
Is there any way we can save the model with the registered custom ops, so that each time when we load the onnx model we don't have to register the custom ops? Right now every time we load the model, w…
-
### 🐛 Describe the bug
Greetings,
I was directed to this repository as I am encountering an issue with PyTorch. Specifically, I am experiencing an error with loading triton when attempting to ru…
-
Hey! You have a wonderful project. Tell me, if possible, how to run the example "Calculating the speed of cars using YOLO v4 in real time" and other examples in this repository in multi-camera mode. I…
-
### Describe the bug
If you create a gradient through MiniMessage and insert a key into it, the first color from the gradient will only be set to the text that came out of the key.
![image](https:…
-
model: baichuan1 13b
enable inflight_fused_batching
**good case post:**
`curl -X POST 10.60.133.200:8030/v2/models/ensemble/generate -d '{"max_tokens": 90, "bad_words": "", "stop_words": "", "t…