llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #6187

[Feature]: lazy import for VLM

### 🚀 The feature, motivation and pitch I used [vLLM 0.5.0.post1](https://github.com/vllm-project/vllm/releases/tag/v0.5.0.post1) for `Mixtral-8x7B-Instruct-v0.1` inference ```bash python3 -m vll…

zhyncs updated 4 months ago
2
BalterLoadTesting/balter #10

[question] measuring time to first byte?

With the rise of APIs that use server-sent events (SSE) like ChatGPT, it is becoming more and more common to want to load test and measure time-to-first-byte (TTFB). For example, TTFB can be a prox…

lukehsiao updated 6 months ago
2
vllm-project/vllm #6916

[Bug]: Unable to build image from `vllm` repo Dockerfile

### Your current environment Not applicable -- Dockerfile. ### 🐛 Describe the bug Steps to reproduce: - Clone the `vllm` repo - run `docker build . --target vllm-base` - Build fails ```shel…

tadamcz updated 6 days ago
7
mlc-ai/mlc-llm #1999

[Question] how to serve 72B Qwen1.5 into 4x3090 gpu?

It seems to me that for now mlc is trying to loading all weight into one gpu card? After convert_weight/gen_config/compile, it report error when ready to serve: ``` AssertionError: Cannot estimat…

leiwen83 updated 3 months ago
14
Buzzpy/Dev-Encyclopedia #95

Suggestion box / UI for users to recommend new topics

Suppose there are certain topics that users are interested in but are not available in the encyclopedia. Is it possible for them to provide feedback on the web to include new issues for the dev site t…

KC900201 updated 2 months ago
28
vllm-project/vllm #5199

[Bug]: Issues with Applying LoRA in vllm on a T4 GPU

### Your current environment I am currently using a T4 instance on Google Colaboratory. ``` Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used…

rikitomo updated 3 months ago
14
microsoft/promptflow #2535

[BUG] Streaming with LLM node requires `stream: true` in `in…

**Describe the bug** Streaming with an LLM node requires `stream: true` in `inputs` of LLM node in `flow.dag.yaml` . Annoyingly gets deleted whenever you run flow in vscode. So when you deploy to doc…

journeyman-msft updated 5 months ago
3
intel-analytics/ipex-llm #11029

Docker on Windows vllm serving issue

I faced an issue with the Docker environment on Windows running vllm serving. I tried start_service.sh code in the docker. https://github.com/intel-analytics/ipex-llm/tree/main/docker/llm/serving/xp…

ktjylsj updated 6 months ago
15
James-QiuHaoran/LLM-serving-with-proxy-models #5

Add support for training per-LLM predictor

Support for training a customized predictor for a specific LLM model by adding a flag that specifies the model name from the [dataset](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)

James-QiuHaoran updated 6 months ago
2
ollama/ollama #3526

Ollama fails to start in CPU only mode

### What is the issue? Ollama fails to start properly when using in a system with only CPU mode. This happened after I upgraded to latest version i.e. 0.1.30 using the curl command as in the docs. …

vishnu-dev updated 2 months ago
11

上一页 1...92 93 94 95 96 97 98...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving