vllm Search Results - Githubissues

1000+ results
for vllm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #5377

[RFC]: OpenVINO vLLM backend

### Motivation. OpenVINO is open source solution for inference deep learning models, including LLMs. OpenVINO supports both Intel and ARM CPUs, Intel integrated and discrete GPUs, NPU and has a goo…

ilya-lavrenov updated 4 weeks ago
3
modelscope/swift #1264

Use OpenAI API (VLLM, etc) with API KEY for evaluations

**Describe the feature** Typically OpenAI API's are protected with some API_KEY, so to run requests to such API's it is necessary to specify API_KEY in a call. Is it planned to add such functionalit…

vlsav updated 23 hours ago
2
ollama/ollama #3953

Support VLLM as a backend

Hi, I realize that this is a big ask but I am learning more and more about inferencing and I've heard that VLLM tends to have better performance for many GPU training. OLLAMA is a great UX and I…

kannon92 updated 1 month ago
2
vllm-project/vllm #6173

[Bug]: As V100 does not support FlashAttention, it is not po…

### Your current environment ```text The output of `python collect_env.py` ``` ### 🐛 Describe the bug python3 -m vllm.entrypoints.openai.api_server --model /model/models/gemma-2-27b-it/ --dtyp…

warlockedward updated 1 day ago
2
intel-analytics/ipex-llm #11228

vllm-cpu bug - Qwen2Attention' object has no attribute 'kv_s…

When running a Qwen1.5 model, it loads but have this error when serving: ``` handle: Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/vllm-0.4.2+cpu-py3.11-lin…

bratao updated 4 weeks ago
2
PygmalionAI/aphrodite-engine #478

[New Model]: Phi3ForCausalLM

### The model to consider. https://huggingface.co/microsoft/Phi-3-medium-128k-instruct I was trying to run the exl2 quants for these models , but getting error at rotatry embedding these models us…

sparsh35 updated 6 days ago
3
Alpha-VLLM/Lumina-T2X #73

What watershed do in precompute_freq_cis function

Hi. Thanks for sharing great works! I wonder what is the role of `scale_watershed` in https://github.com/Alpha-VLLM/Lumina-T2X/blob/7bc7d7d70a20a262b4f04e873497f58f722aa224/lumina_next_t2i/models/m…

yjhong89 updated 3 weeks ago
1
bd-iaas-us/vllm #3

Design doc for cpu offloading feature

We already have a brief description about this proposed feature in the vLLM issue (https://github.com/vllm-project/vllm/issues/3563), but we still need a more detailed design document: * Value prop…

XiaoningDing updated 1 day ago
15
QwenLM/Qwen2 #729

使用vllm部署Qwen2-57B-A14B-chat报错assert loaded_weight.shape[para…

设备信息：8*A800 80G 启动命令如下 ```bash nohup python -m vllm.entrypoints.openai.api_server \ --served-model-name Qwen2-57B-A14B-Instruct \ --model /media/user/data_one/nlp_model/Qwen2-57B-A14B-I…

1006076811 updated 2 days ago
2
InternLM/lmdeploy #1745

[Feature] `min_p` sampling parameter

### Motivation The `min_p` sampling parameter is becoming quite popular. It's conceptually simple and "makes sense", and (at least anecdotally, according to opinions of many model fine-tuners and u…

josephrocca updated 4 weeks ago
1

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for vllm

1000+ results
for vllm