-
Thx for ur brilliant work!
I have a small question regarding the content of the paper: In Table 1, for the sequence length, will the total length of the Megatron-LM method be equal to the sum of the …
-
Hello, guys,
Any plans to support offline batch inference mode with local models, without spinning up an additional server? similar to [what is implemented in vLLM](https://docs.vllm.ai/en/latest/get…
-
**Your question**
```[tasklist]
### Tasks
```
```[tasklist]
### Tasks
```
-
https://arxiv.org/pdf/2201.12023.pdf
-
### System Info
```Shell
- `Accelerate` version: 0.18.0
- Platform: Linux-3.10.0-1160.76.1.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.9.12
- Numpy version: 1.22.4
- PyTorch version (…
-
I try to Build LLaMA 7B using 2-way tensor parallelism.
But when I execute run.py I got this error.AssertionError: Engine world size (2) != Runtime world size (1)
-
We use A10, and model of CodeLlama-7B which from [HuggingFace](https://huggingface.co/codellama/CodeLlama-7b-hf/tree/main).
And I use the latest tensorrtllm_backend and TensorRT-LLM of main branch.…
-
Thanks for the latest updates and improvements!
I was looking into the different llava example notebooks and the [VILA example](https://github.com/mit-han-lab/llm-awq/blob/main/scripts/vila_example.s…
-
I used the following steps to build SQ engine
First, build docker image from main branch
```
git clone -b main https://github.com/triton-inference-server/tensorrtllm_backend.git
# Update the su…
-
python3.6 test_client.py uci_housing_client/serving_client_conf.prototxt
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0425 03:51:49.812734 3687 general_model.cpp:73] feed var n…