-
When I run the GQTP W4A16 LLAMA 2 7B on A100, I have this Issue. I didn't get any bug report. Is this memory issue?
quantizing weights: 78%|█████████████████████████████████████████████████████████…
-
### 🥰 Feature Description
Support LLaMA 3.2 models
### 🧐 Proposed Solution
Add LLaMA 3.2 models to model lists, including Ollama, GitHub, and so on
### 📝 Additional Information
_No response_
-
### Requirements
- [X] I have searched the issues of this repository and believe that this is not a duplicate
- [X] I have confirmed this bug exists on the latest version of the app
### Platform
Wi…
-
Mamba-2 is a new version of the Mamba architecture:
- Blog: https://tridao.me/blog/2024/mamba2-part1-model/
- Paper: https://arxiv.org/abs/2405.21060
-
Thanks for your wonderful work and clear and clean open-source code.
I met a question when running 04_query_llm.sh. Then I found that in the libs/llama/ there is no llama-2-13b-chat. May I know ho…
-
### System Info
We tried with H100/A100 GPU machine both got the same issues.
transformers==4.46.1
torch==2.3.1+cu121
### Who can help?
@ArthurZucker
### Information
- [ ] The offi…
-
### 🐛 Describe the bug
torchchat will OOM when using the Llama 3.2 11b model almost right away. If it doesn't OOM on the first request it will on the second.
When spinning up the server it'll imm…
-
### Your current environment
Hello,
I loaded `llama3.2-3b-instruct` on VLLM and observed a significant decrease in accuracy compared to when I run the model using Hugging Face Transformers. This i…
-
I am trying to load the [meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8) model, but I am encountering issues with the o…
-
# Llama 3.2 Vision in Workflows
Are you ready to make a difference this Hacktoberfest? We’re excited to invite you to contribute by integrating LLama 3.2 Vision into our Workflows ecosystem! This new…