llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

kubernetes-sigs/lws #259

LeaderWorkerSet should support heterogenous resource require…

**What would you like to be added**: LeaderWorkerSet should support heterogenous resource requirements across Workers. **Why is this needed**: In the use case of disaggregated serving there m…

supertetelman updated 1 day ago
1
tanghl1994/rebutall_huazi #10

R6

We sincerely appreciate your constructive feedback of our paper, and we will make certain updates w.r.t. your concerns in our revised version. Below are our responses to your concerns. Q1: Comparis…

tanghl1994 updated 1 day ago
1
vllm-project/vllm #5060

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Ba…

### Your current environment docker image: vllm/vllm-openai:0.4.2 Model: https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ GPUs: RTX8000 * 2 ### 🐛 Describe the bug The model works f…

heungson updated 6 hours ago
42
mlc-ai/tokenizers-cpp #40

does it support multi - thread decode ?

I meet coredump when decoding with multi-thread. It cored in rust function `tokenizers_decode`，rust/src/lib.rs:199. here is the core backtrack. why does it do not support multi-thread? I think dec…

Vincent-syr updated 3 months ago
1
explodinggradients/ragas #1576

Testset generation issue with ollama, gets stuck at Generati…

[X] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug. **Describe the bug** I am unable to create test data set, using Ollama models , it…

rajuptvs updated 3 weeks ago
4
vllm-project/vllm #5901

[Bug]: TRACKING ISSUE: `AsyncEngineDeadError`

### Your current environment ```text The output of `python collect_env.py` ``` ### 🐛 Describe the bug Recently, we have seen reports of `AsyncEngineDeadError`, including: - [ ] #5060 …

robertgshaw2-neuralmagic updated 9 hours ago
18
envoyproxy/gateway #4431

Add support for with_request_body in SecurityPolicy.spec.ext…

Envoy supports sending the full request body to the external authorization server via the with_request_body filter configuration. Do you think that it is possible to expose such feature on the Securit…

mjf-89 updated 2 weeks ago
11
bigscience-workshop/petals #614

Performance improving chances in the future

Hi there, I've been following this work for a few months and found it's really an amazing idea to run LLMs over the Internet, while I'm also trying to improve Petals' performance on model inference in…

oldcpple updated 1 month ago
1
a16z-infra/ai-town #256

"No default world found" error when trying to run AI Town lo…

This is on an M3 MacBook Pro. 1. I'm following the guide, I had Ollama set up and running already, I'm serving a llama3 variant which I tested. Listening on the 1st terminal window. 2. I configured …

MrCsabaToth updated 4 days ago
10
ollama/ollama #5016

Integration with MLFlow

Hey, Currently, Ollama is saving models locally on a cache. To maintain different versions of LLMs or finetuned ones and also for extensive monitoring it's a good idea to provide integration with M…

ulhaqi12 updated 3 days ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving