-
/kind feature
**Describe the solution you'd like**
Hope add [https://github.com/xorbitsai/inference](https://github.com/xorbitsai/inference) as the kserve huggingface LLMs serving runtime
Xor…
-
### Motivation
This is an interesting blog post [FireAttention V2: 12x faster to make Long Contexts practical for Online Inference](https://fireworks.ai/blog/fireattention-v2-long-context-inference…
-
### 🚀 The feature, motivation and pitch
DeepSeek-V2 design **MLA (Multi-head Latent Attention)**, which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key…
-
### Describe the issue
There are no example about Multi-threading on android device regarding of using multiple models
### To reproduce
N/A
### Urgency
I really need it ASAP please help…
-
Got error "Error Building Component
Error building vertex Hugging Face API: Failed to resolve model_id:Could not find model id for inference server: https://api-inference.huggingface.co/models/mi…
-
Do you have a plan to support JetMoE model (https://github.com/myshell-ai/JetMoE) that very effective to reduce computational cost in inference in litgpt?
-
https://docs.together.ai/docs/inference-models
https://docs.together.ai/reference/chat-completions
-
**问题描述 / Problem Description**
知识库管理功能添加txt格式知识库报错,chatchat后台报错ModuleNotFoundError: No module named 'unstructured_inference.inference.ordering'。
**复现问题的步骤 / Steps to Reproduce**
1. 在知识库管理页面,点击“上…
-
As you may know, Raspberry Pi Ltd launch the [Raspberry Pi AI Kit](https://www.raspberrypi.com/documentation/accessories/ai-kit.html). Its essentially a HAT+ with a Hailo AI NPU to do inference. From …
-
### Describe the issue
This issue is a place to discuss the impact of not being able to rely on the `name` field on messages and existing, or proposed, solutions to cater for this.
---
The `n…