llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #7940

[Bug]: RuntimeError: operator torchvision::nms does not exis…

### Your current environment Collecting environment information... INFO 08-28 14:32:56 importing.py:10] Triton not installed; certain GPU-related functions will not be available. WARNING 08-28 14:3…

murray-z updated 2 weeks ago
12
NVIDIA/TensorRT-LLM #1298

how to use trt_llm to accelerate original llava-liuhaotian/l…

when i use the example in multimodel, i download the original model-liuhaotian/llava-v1.5-7b,but some error occur? llama = from_hugging_face( File "/usr/local/lib/python3.10/dist-packages/tensor…

ganliqiang updated 3 months ago
9
kserve/open-inference-protocol #5

How open-inference-protocol works in LLMs, any use case?

With the increasing popularity of LLMs, many companies have started to look into deploying LLMs. Instead of `infer/predict`, `completions` and `embeddings` are being used. Most of the API supports…

lizzzcai updated 10 months ago
3
mit-han-lab/qserve #5

Would this work on consumer hardware and integrated in frame…

As per title. Example: with GPUs like 3060 12GB or 3090 24GB.

Mayorc1978 updated 3 months ago
3
symbiont-me/symbiont-backend #54

Add kit to allow running local LLMs and Vector models

In line with the main philosophy of the Symbiont app, we want to use products that are open source and provide the option for self-hosting for maximum privacy and control.

thelonehegelian updated 4 months ago
2
Samsung/ONE #13697

[luci] Shape inference for operators whose inputs have dynam…

## What Let's support shape inference for operators whose inputs have dynamic shape. I've made a list of operators to support dynamic-shaped LLM inference. ### First milestone (for token gen …

jinevening updated 3 days ago
14
dottxt-ai/outlines #811

Add a cookbook on how to send batch requests with vLLM

Some users may need to send batch requests with several prompt/schema pairs. It is possible to do this with the vLLM server integration using `aiohttp`, and we should document this.

rlouf updated 5 months ago
1
lm-sys/FastChat #3017

When the type of context in the incoming messages is text, a…

### When the type of context in the incoming messages is text, an error occurs. **API**: `/v1/chat/completions` ### request ```json { "max_tokens": 0, "model": "qwen-72b-chat-int4"…

icowan updated 3 months ago
1
vllm-project/vllm #7714

[Bug]: Unable to use fp8 kv cache with chunked prefill on am…

### Your current environment The output of `python collect_env.py` ```text PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N…

w013nad updated 1 week ago
15
instructlab/sdg #259

No feedback from ilab data generate

**Describe the bug** When I run `ilab data generate` there is no update or output like 0.17.1. ``` (venv-instructlab-3.11) ➜ instructlab ilab data generate INFO 2024-08-08 16:00:04,437 numexpr.utils…

jjasghar updated 2 weeks ago
7

上一页 1...13 14 15 16 17 18 19...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving