-
I am deploying Example1:[Using Joint Inference Service in Helmet Detection Scenario](https://github.com/kubeedge/sedna/blob/main/examples/joint_inference/helmet_detection_inference/README.md).
edge…
-
Appears when generating links
> Traceback (most recent call last):
> File "/content/sd-inference-server/remote.py", line 8, in
> import storage
> File "/content/sd-inference-server/stor…
-
### Describe the problem
Feature request for https://github.com/huggingface/text-embeddings-inference, it new and cool good to have it.
### Describe the proposed solution
create a custom Embe…
-
[[Open issues - help wanted!]](https://github.com/vllm-project/vllm/issues/4194#issuecomment-2102487467)
**Update [9/8] - We have finished majority of the refactoring and made extensive progress fo…
-
__Current tasks:__
- [ ] prototype bloom points system @borzunov (#6 )
- [x] local tensor parallelism ( #143 , using [BlackSamorez/tesnor_parallel](https://github.com/BlackSamorez/tensor_parall…
-
**Is your feature request related to a problem? Please describe.**
Rust API for Triton Server to integrate Triton in-process with a Rust Server
Rust is now a universally recommended language to deve…
-
https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md
if possible add speculvative decoding example in llama docs.
-
### Operating System
Windows
### Version Information
Recently we have discovered a problem due to the error in the DNN scoring script file. **Please see the workaround in Additional information sec…
-
Thanks for sharing your script to run the 4-bit quantized molmo-7b.
Unfortunately, I am unable to run it on my server (Ubuntu 22.04 with 2x RTX A5000 48 GB VRAM) - the error trace is below.
I wonde…
-
I am having some issues with the DeepMreye demo using the exemplary data from the 2 first participants from the sample dataset as instructed in the notebook "deepmreye_example_usage_pretrained_model_w…