vllm Search Results - Githubissues

1000+ results
for vllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

EleutherAI/lm-evaluation-harness #1625

Speed up inference problems

I am trying to speed up benchmarking on A100. Below are times of tests on one task in two versions using Mistral. ![image](https://github.com/EleutherAI/lm-evaluation-harness/assets/1849959/f012818…

djstrong updated 3 months ago
19
modelscope/swift #794

Langchain-Chatchat部署训练后的模型后推理异常

**Describe the bug** rsLoRA微调yi-6B-chat，Swift的web ui和命令行infer都正常，但fastchat后端启动后推理乱码 ![image](https://github.com/modelscope/swift/assets/28507966/ed565565-c6b7-4488-a491-ef5c19048d5b) 对于LoRA微调yi-6B-…

WSC741606 updated 2 months ago
8
vllm-project/vllm #5721

[New Model]: Chameleon support

### The model to consider. https://huggingface.co/facebook/chameleon (as of now, the models can be downloaded using the [model form](https://ai.meta.com/resources/models-and-libraries/chameleon-do…

nopperl updated 2 weeks ago
3
DaoCloud/public-image-mirror #10635

docker.io/vllm/vllm-openai:v0.5.0.post1

IMAGE SYNC

eijix updated 3 weeks ago
3
vllm-project/vllm #5202

May I ask when the qwen moe quantization version is supporte…

### 🚀 The feature, motivation and pitch As the title suggests Currently, VLLM supports MOE, but does not support quantitative versions. During use, the quantitative version will provide better cost-…

wellcasa updated 1 month ago
1
vllm-project/vllm #6156

[Bug]: When starting deepseek-coder-v2-lite-instruct with vl…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

fengyang95 updated 1 day ago
6
outlines-dev/outlines #684

Allow Tokens to Span Multiple Terminals in CFG

### Discussed in https://github.com/outlines-dev/outlines/discussions/683 Originally posted by **lapp0** January 23, 2024 ### What behavior of the library made you think about the improvement?…

brandonwillard updated 1 month ago
2
ContextualAI/gritlm #2

How would one go about running embedding as a service using …

I would like to run embedding as a service using something like vLLM on a Docker container on different host. How would one go about doing this?

sungkim11 updated 4 months ago
1
THUDM/LongBench #48

Evaluate on long context (32k,64k etc..) on 30B/70B large mo…

Hi, I found that the original script cannot handle large models on long context effectively, since it use multiprocess to load an entire model on a single gpu. I also tried different methods to…

CaesarWWK updated 5 months ago
5
EmbeddedLLM/vllm-rocm #21

Unable to load models on RX 6800

On my RX 6800 I seem to get `RuntimeError: FlashAttention only supports AMD MI200 GPUs or newer.` for some reason, I Googled that GPU and it seems to be RDNA2 like mine but for enterprise. Is this not…

nonetrix updated 2 months ago
2

上一页 1...92 93 94 95 96 97 98...100 下一页

1000+ results for vllm

1000+ results
for vllm