-
Hello,
I'm new to LLM serving and multi-modal LLMs. I'm looking for similar examples for the LongVILA model, like the one for VILA1.5 models:
```
python -W ignore llava/eval/run_vila.py --mod…
-
Hello,
I am using a fine tuned open source LLM and it works great in the Docker after following the instructions to build TensorRT-LLM.
However, after building the wheel install package I am not …
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I have spent a few hours looking through the documentation and asking in the Discord. I'…
-
### Feature Description
When using LLM serving frameworks such as [vLLM](https://github.com/vllm-project/vllm) or [MLC-LLM](https://github.com/mlc-ai/mlc-llm) , or services that host open-source mod…
-
### System Info
x86_64
Ubuntu20.04
A100x8
TRT-LLM version v0.9.0
### Who can help?
_No response_
### Information
- [X] The official example scripts
- [ ] My own modified scripts
…
-
# What I see
Usually the kvcache arg looks like
```mlir
%arg4: !torch.tensor
```
and is the last arg in decode_bsX and prefill_bsX
But when I export ONLY `bs=1`, I see 50+ arguments, most of …
-
Hi, thanks for you great work! The issue I am concerned about is the deployed parallelism when compared to Lookahead. As far as I know, Lookahead currently does not supports tensor parallelism which i…
-
### System Info
hi,
i generated the tensorrt llm engine for a llama based model and see that the performance is much worse than vllm.
i did the following:
- compile model with tensorrt llm c…
-
my current environment:
```
0.4.2
```
my bug:
i deployed hermes-2-pro-mistral-7b model with multi lora adapters. after applying a large multi adapter load on it, i started receiving an erro…
-
### 🚀 The feature, motivation and pitch
Ollama has the docker image [ollama/ollama:rocm](https://hub.docker.com/layers/ollama/ollama/rocm/images/sha256-2368286e0fca3b4f56e017a9aa4809408d8a8c6596e3cbd…