llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVlabs/VILA #141

Context size and examples for LongVILA

Hello, I'm new to LLM serving and multi-modal LLMs. I'm looking for similar examples for the LongVILA model, like the one for VILA1.5 models: ``` python -W ignore llava/eval/run_vila.py --mod…

yulinzou updated 5 days ago
1
NVIDIA/TensorRT-LLM #581

Best way to deploy/test LLM models on TensorRT-LLM for produ…

Hello, I am using a fine tuned open source LLM and it works great in the Docker after following the instructions to build TensorRT-LLM. However, after building the wheel install package I am not …

amir1m updated 1 week ago
3
run-llama/llama_index #16392

[Question]: RAG Against Existing Milvus Vector Database with…

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question I have spent a few hours looking through the documentation and asking in the Discord. I'…

yusuf-jkhan1 updated 1 month ago
1
vercel/ai #3521

Custom Tool Parser for Open Source Models

### Feature Description When using LLM serving frameworks such as [vLLM](https://github.com/vllm-project/vllm) or [MLC-LLM](https://github.com/mlc-ai/mlc-llm) , or services that host open-source mod…

ShervK updated 1 week ago
18
NVIDIA/TensorRT-LLM #1288

ModuleNotFoundError: No module named 'tensorrt_llm.bindings'…

### System Info x86_64 Ubuntu20.04 A100x8 TRT-LLM version v0.9.0 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified scripts …

WuhanMonkey updated 1 week ago
4
nod-ai/shark-ai #312

paged_llm with `--bs=1` exports func.func with too many kvca…

# What I see Usually the kvcache arg looks like ```mlir %arg4: !torch.tensor ``` and is the last arg in decode_bsX and prefill_bsX But when I export ONLY `bs=1`, I see 50+ arguments, most of …

renxida updated 1 month ago
2
SafeAILab/EAGLE #145

Issue about the comparison with Lookahead

Hi, thanks for you great work! The issue I am concerned about is the deployed parallelism when compared to Lookahead. As far as I know, Lookahead currently does not supports tensor parallelism which i…

lethean1 updated 1 month ago
1
NVIDIA/TensorRT-LLM #1938

problem with tensorrt_llm performance

### System Info hi, i generated the tensorrt llm engine for a llama based model and see that the performance is much worse than vllm. i did the following: - compile model with tensorrt llm c…

Arnold1 updated 1 week ago
5
vllm-project/vllm #5105

[Bug]: async engine failure when placing multi lora adapter …

my current environment: ``` 0.4.2 ``` my bug: i deployed hermes-2-pro-mistral-7b model with multi lora adapters. after applying a large multi adapter load on it, i started receiving an erro…

DavidPeleg6 updated 1 month ago
3
meta-llama/llama-stack #363

Enable distributuion/ollama for rocm

### 🚀 The feature, motivation and pitch Ollama has the docker image [ollama/ollama:rocm](https://hub.docker.com/layers/ollama/ollama/rocm/images/sha256-2368286e0fca3b4f56e017a9aa4809408d8a8c6596e3cbd…

alexhegit updated 2 weeks ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving