llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #5199

[Bug]: Issues with Applying LoRA in vllm on a T4 GPU

### Your current environment I am currently using a T4 instance on Google Colaboratory. ``` Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used…

rikitomo updated 4 months ago
14
mlc-ai/mlc-llm #1937

[Question] how to convert an already quanted model?

Could already quanted model like: https://huggingface.co/01-ai/Yi-34B-Chat-4bits could be directly compiled in mlc_llm? I try directly do --quant option like q0f16 or q4f16, but it report some lay…

leiwen83 updated 6 months ago
1
ggerganov/llama.cpp #7505

convert-hf-to-gguf.py Qwen1.5-4B-Chat-GPTQ-Int4 error

`git clone --depth 1 --single-branch https://huggingface.co/Qwen/Qwen1.5-4B-Chat-GPTQ-Int4 ` ``` INFO:hf-to-gguf:Loading model: Qwen1.5-4B-Chat-GPTQ-Int4 INFO:gguf.gguf_writer:gguf: This GGUF f…

0wwafa updated 3 months ago
6
vllm-project/vllm #8978

[Usage]: Serving Llama 3.2 `llama-3-2-11b-vision-instruct` h…

### Your current environment ```text The output of `python collect_env.py` ``` ``` :128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', bu…

rchen19 updated 1 month ago
8
xorbitsai/inference #13

ENH: support whisper

[Whisper](https://github.com/openai/whisper) is an open-source model created by OpenAI. The author of [ggml](https://github.com/ggerganov) provides a high-performance inference impl using ggml call…

UranusSeven updated 4 months ago
2
James-QiuHaoran/LLM-serving-with-proxy-models #5

Add support for training per-LLM predictor

Support for training a customized predictor for a specific LLM model by adding a flag that specifies the model name from the [dataset](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)

James-QiuHaoran updated 6 months ago
2
vllm-project/vllm #572

ModuleNotFoundError: No module named 'transformers_modules' …

I tried to deploy an API serving using baichuan-7b, but there is an error: ``` NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=6,7 python -m vllm.entrypoints.openai.api_server --model /root/data/zyy/baichua…

McCarrtney updated 2 months ago
14
Buzzpy/Dev-Encyclopedia #95

Suggestion box / UI for users to recommend new topics

Suppose there are certain topics that users are interested in but are not available in the encyclopedia. Is it possible for them to provide feedback on the web to include new issues for the dev site t…

KC900201 updated 2 months ago
28
kserve/kserve #3689

Unable to run InferenceService on a local cluster

/kind bug **What steps did you take and what happened:** I have a local cluster without internet access. Manifests version 1.8 is deployed on it. I deployed this version using images imported as t…

yurkoff-mv updated 6 months ago
12
vllm-project/vllm #5535

[Bug]: Performance : very slow inference for Mixtral 8x7B In…

### Your current environment ```text Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu …

Syst3m1cAn0maly updated 4 months ago
4

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving