-
Hey! You have a wonderful project. Tell me, if possible, how to run the example "Calculating the speed of cars using YOLO v4 in real time" and other examples in this repository in multi-camera mode. I…
-
model: baichuan1 13b
enable inflight_fused_batching
**good case post:**
`curl -X POST 10.60.133.200:8030/v2/models/ensemble/generate -d '{"max_tokens": 90, "bad_words": "", "stop_words": "", "t…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Are you using forge?
No
### Installed conforming to our guide?
- [X] I have read the installation guide and …
-
Is there any way we can save the model with the registered custom ops, so that each time when we load the onnx model we don't have to register the custom ops? Right now every time we load the model, w…
-
### System Info
We are deploying the model meta-llama/Meta-Llama-3.1-70B-Instruct with FP8 quantization and everything works perfectly for hours until the server crashes with this error:
2024-10-…
-
CUDA supports:
https://github.com/kimlimjustin/xplorer/blob/master/src/Service/app.ts
https://github.com/launchbadge/sqlx
https://github.com/Jimver/cuda-toolkit
https://github.com/LLukas22/llm-r…
-
Currently we only support dynamically-installed conda environments, but that is not well-suited for production usage.
@jiaodong I think we should make this a requirement for OSS jobs release.
``…
-
**Is this a BUG REPORT or FEATURE REQUEST?**:
> Uncomment only one, leave it on its own line:
>
> /kind bug
> /kind feature
**What happened**:
Investigate if we can use https://github.…
-
Hello,
I tried to use nvidia triton streaming configuration with pruned stateless 7 streaming model, but it seems that one input is missing to encoder "avg_cache", this seems to be added in new zip…
-
### 🐛 Describe the bug
cross-posting from https://github.com/VKCOM/YouTokenToMe/issues/113 (since I'm not sure if it belongs in pytorch or YouTokenToMe):
reduced from a more complex example:
``…