-
### 🚀 The feature, motivation and pitch
right now our implementation of RoPE assumes the rotation matrix is created and used in the [HuggingFace model code](https://github.com/huggingface/transform…
-
Hello,
I am trying to integrate `guidellm` into a benchmark suite. And there we ran different load tests based on use concurrencies. We define user concurrenies as "users" that send requests after…
-
In order to apply LLM2Vec to DictaLM we need:
- [ ] Identify base model - https://huggingface.co/collections/dicta-il/dicta-lm-20-collection-661bbda397df671e4a430c27
- [ ] Prepare dataset for MNTP…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of…
OKUA1 updated
1 month ago
-
### What happened?
My usual build recipe and run scripts do not work after b3680. Something changed in b3681, but I don't know what.
I see this same failure across models and cli flags, so it seem…
-
### Describe the bug
I have downloaded Hugging Face "meta-llama/Meta-Llama-3.1-8B-Instruct" model to do Q8_0 type quantization using the latest llama.cpp to keep it up-to-date, increase efficiency an…
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…
-
Given that we have only Llama 3 70B and 8B, it would be useful to have a Tiny Llama based on the Llama 3 tokenizer so that we can use it as a drafting model for speculative decoding.
Are there pla…
cduk updated
2 months ago
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I want to run inference of a llama3. I don't know how to integrate it with vllm…
-
## Description
This issue tracks the process of facilitating this integration and ensures our repository is ready for incorporation into LlamaIndex.
## Objectives
- [ ] Evaluate the compatibi…