-
I'm on an A100 arch linux box, [latest master](https://github.com/zml/zml/commit/d366f00aed18807f558bcf4a5defb8150d550c3f) at the time of this issue.
I attempt to run llama 3.1 with cuda=true runt…
-
what was the quantisation algorithm used in unsloth/Llama-3.2-1B-bnb-4bit model: https://huggingface.co/docs/transformers/main/en/quantization/overview. Is it int4_awq or int4_weightonly ?
-
**Describe the bug**
When I run the example from examples/python/awq-quantized-model.md, but switching out phi-3 for llama-3.2-3b, I get an error message stating that `AttributeError: 'NoneType' objec…
-
Hi, there might be a bug in unsloth I found. For better clarification, I shared the code of the unsloth's llama 3.1 training notebook just with a small change . anyone can help me check why the train…
-
## 🐛 Bug
## To Reproduce
Steps to reproduce the behavior:
1. Download the weights for LLama 3.2 1B and 3B from huggingface: https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q0f16-MLC a…
-
**Describe the bug**
The model response doesn't stop. It keeps writing. I tried both `swift deploy` and `vllm`
Training arguments:
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 \
USE_HF=1 \
CUDA_VISIBLE…
-
### Describe the bug
interpreter --local
Open Interpreter supports multiple local model providers.
[?] Select a provider:
> Ollama
Llamafile
LM Studio
Jan
…
-
User Anton on discord reported:
Collecting diskcache>=5.6.1 (from llama-cpp-python)
Using cached diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Requirement already satisfied: jinja2>=2.11.3 …
-
### What is the issue?
**Description:**
I encountered an issue where the **LLaMA 3.2 Vision 11b** model loads entirely in CPU RAM, without utilizing the GPU memory as expected. The issue occurs on m…
-
I saw you used something like this:
```
model = FastVisionModel.get_peft_model(
model,
finetune_vision_layers = True, # False if not finetuning vision part
finetune_language_lay…