-
### Proposal to improve performance
The spec dec performance of Eagleis worse than expected as shown below:
Model: [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llam…
-
command:
accelerate launch run_evals_accelerate.py --model_args="Llama-2-7b-chat-hf-8bit,quantization_config="load_in_8bit=True"" --tasks "helm|hellaswag|1|0" -- --output_dir ./evalscratch
Resul…
-
Problem Description
We can not use the new llama3.1 models on the groq API.
Proposed Solution
Add the model as an option, and maybe add a custom model name
Additional Context
[Llama 3.1 405B …
-
/kind feature
**Describe the solution you'd like**
Is there any config to modify the imagePullPolicy of queue-proxy? This question has stoned me for a long time and I've read docs of kserve & knat…
-
### Description:
I encountered an issue while trying to run my Flutter application using the `llama_cpp_dart` package. The error occurs when the app attempts to load the `libllama.so` dynamic library…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
llamafactory-cli env
llamafactory version: 0.9.1.dev0
Platform: Linux-5.4.0-144-generic-x86_64-wi…
-
I have followed the tutorial till querying unstructured data source. Changed the import to core/experimental from llama_index.
I get this error when run the main.py
`& : File C:\Users\User\Desktop\S…
-
Hi, I am trying to run the `Llama-3.1 8b + Unsloth 2x faster finetuning.ipynb` you provided in the README. However, when I use google colab to run the second cell I got this error:
``` bash
------…
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/Mozilla-Ocho/llamafile/blob/master/README.…
-
When loading a model across 2 GPUs, the layers are split evenly, but the GPU memory usage is quite a bit higher on the first GPU:
```
|=========================================+===================…