-
### š The feature, motivation and pitch
I used [vLLM 0.5.0.post1](https://github.com/vllm-project/vllm/releases/tag/v0.5.0.post1) for `Mixtral-8x7B-Instruct-v0.1` inference
```bash
python3 -m vllā¦
-
With the rise of APIs that use server-sent events (SSE) like ChatGPT, it is becoming more and more common to want to load test and measure time-to-first-byte (TTFB).
For example, TTFB can be a proxā¦
-
### Your current environment
Not applicable -- Dockerfile.
### š Describe the bug
Steps to reproduce:
- Clone the `vllm` repo
- run `docker build . --target vllm-base`
- Build fails
```shelā¦
-
It seems to me that for now mlc is trying to loading all weight into one gpu card?
After convert_weight/gen_config/compile, it report error when ready to serve:
```
AssertionError: Cannot estimatā¦
-
Suppose there are certain topics that users are interested in but are not available in the encyclopedia. Is it possible for them to provide feedback on the web to include new issues for the dev site tā¦
-
### Your current environment
I am currently using a T4 instance on Google Colaboratory.
```
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA usedā¦
-
**Describe the bug**
Streaming with an LLM node requires `stream: true` in `inputs` of LLM node in `flow.dag.yaml` . Annoyingly gets deleted whenever you run flow in vscode. So when you deploy to docā¦
-
I faced an issue with the Docker environment on Windows running vllm serving.
I tried start_service.sh code in the docker.
https://github.com/intel-analytics/ipex-llm/tree/main/docker/llm/serving/xpā¦
-
Support for training a customized predictor for a specific LLM model by adding a flag that specifies the model name from the [dataset](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)
-
### What is the issue?
Ollama fails to start properly when using in a system with only CPU mode. This happened after I upgraded to latest version i.e. 0.1.30 using the curl command as in the docs. ā¦