-
In this repo the Llama3 tokenizer sets the `` special token to `128011` https://github.com/meta-llama/llama-models/blob/ec6b56330258f6c544a6ca95c52a2aee09d8e3ca/models/llama3/api/tokenizer.py#L79-L101…
-
I'm using :
- MacOS Ventura 13.2.1
- MacBook Air M1
When I execute the command :
```python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s```
I got the message:
```
INFO:root:C…
-
run : python setup_env.py -md /home/disk1/Llama3-8B-1.58-100B-tokens -q i2_s
found:
FileNotFoundError: [Errno 2] No such file or directory: './build/bin/llama-quantize'
-
I'm trying to fine tune a model (llama 3.1 8B Instruct) on a custom dataset.
The dataset is made of 3 fields: input, metadata and output. I could use the Alpaca style prompt, but I don't think it fit…
-
Hello,
I tried to reproduce the results of the paper, and got similar results for Llama2-7B, 13B, 70B, and Llama-3 8B.
However, when I tested Llama3-70B using the optimized rotation matrix you p…
-
**Description**
I have noticed that there was a huge difference in memory usage for runtime buffers and decoder for llama3 and llama3.1.
**Triton Information**
What version of Triton are you usin…
-
### Requirements
- [X] I have searched the issues of this repository and believe that this is not a duplicate
- [X] I have confirmed this bug exists on the latest version of the app
### Platform
Wi…
-
### Your current environment
```text
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC …
-
Ovis1.6-Llama3.2-3B-GPTQ-Int4. How can it be inferred using the CPU?
-
### 🚀 The feature, motivation and pitch
Llama3.2 vision (Mllama) models requires model runner as "Enocoder_Decoder_Model_Runner"
which includes:
1. prepare "encoder_seq_lens" and "encoder_seq_len…