inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/server #7368

gRPC Segfaults in Triton 24.06 due to Low Request Cancellati…

**Description** We use gRPC to query Triton for Model Ready, Model Metadata and Model Inference Requests. When running the Triton server for a sustained period of time, we get Segfaults unexpectedly …

AshwinAmbal updated 2 weeks ago
19
KwaiVGI/LivePortrait #326

CUDA内存不足

Loaded cached embeddings from file. Checking if the server is listening on port 8890... Server not ready, waiting 4 seconds... Traceback (most recent call last): File "D:\LivePortrait-Windows-v2…

gaizi123 updated 1 month ago
2
huggingface/text-generation-inference #2474

Watermarking cannot be detected

### System Info text-generation-inference version 2.2.0 model "mistralai/Mixtral-8x7B-Instruct-v0.1" ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported c…

vorwerkc updated 4 days ago
3
tryAGI/LangChain.Providers #8

Providers: HuggingFace

Hi, I'm new to Langchain and LLM. I've recently deployed an LLM model using the Hugging Face text-generation-inference library on my local machine. I've successfully accessed the model using …

nigue3025 updated 1 month ago
1
xorbitsai/inference #1779

how to enable batch inference with embedding/rerank model?

## 问题描述用UI启动的embedding/rerank模型，没有并发相关的设置客户端用asyncio、concurrent.futures方式发送请求，速度竟然比同步的for loop还慢 **应该怎么能使模型并发推理？** ## xinference侧启动的模型 embedding: rerank: ## 测试结果 ### embedding接口测…

SunLemuria updated 1 month ago
1
openvinotoolkit/openvino #26375

[Build]: Dynamic Input Issue on NPU with GNN Inference

### OpenVINO Version 2024.03 ### Operating System Windows System ### Hardware Architecture x86 (64 bits) ### Target Platform Host Name: LAPTOP-D60VPN1Q OS Name: …

Endlessfancy updated 4 hours ago
3
gpustack/gpustack #287

Support vLLM as the inference server

vLLM is a popular choice for serving LLMs in production. It also has a strong community and iterates fast to support new models.

gitlawr updated 1 day ago
1
InftyAI/llmaz #21

Liveness & Readiness support

Add the support for inference services.

kerthcet updated 6 days ago
3
apache/incubator-seata #6882

`org.apache.seata:seata-mock-server` depends on the non-exis…

- [x] I have searched the [issues](https://github.com/seata/seata/issues) of this repository and believe that this is not a duplicate. ### Ⅰ. Issue Description - `org.apache.seata:seata-mock…

linghengqian updated 20 hours ago
4
kevinhermawan/Ollamac #106

Stop button doesn't actually stop chat streaming from the Ol…

If you submit a chat and press the stop button, Ollamac doesn't stop the Ollama from streaming the response, it just stop updating the UI. This is bad in general, but particularly bad when the mode…

iguzu updated 11 hours ago
1

上一页 1...16 17 18 19 20 21 22...100 下一页

1000+ results for inference-server

1000+ results
for inference-server