-
### tl;dr
There has been a change in behavior between [`main` (e51ede12e1b639fd30c8797eb3bbd8b9fb3de826)](https://github.com/lmstudio-ai/mlx-engine/commit/e51ede12e1b639fd30c8797eb3bbd8b9fb3de826) an…
-
here is the summary:
`unsloth/mistral-7b-v0.3-bnb-4bit` with error : ` KeyError: 'layers.0.mlp.down_proj.weight'`
`unsloth/Qwen2.5-7B-Instruct-bnb-4bit` with error: `KeyError: 'layers.0.mlp.down_pro…
-
[https://huggingface.co/upstage/solar-pro-preview-instruct](https://huggingface.co/upstage/solar-pro-preview-instruct)
Solar released a new 22b model, and this thing is crazy powerful. I was just won…
-
Env: torch2.4 cuda 12.4 unsloth main
below is the code errored
```
from unsloth import FastLanguageModel
import torch
model_id="unsloth/gemma-2-2b-it-bnb-4bit"
model, tokenizer = FastLanguageM…
-
Currently having issues attempting to quantize, save, then load the model using HF Transformers.
Is there any known working method for quantizing Aria (preferably to 4bit)?
-
Although it may be out of scope, it would be nice to have an example of computing 4bit and 8bit tensors, to save memory bandwidth.
-
Hi. Raising this issue as I am experimenting a much slower inference time with Gemma-1 models.
> Environment:
> - xformers 0.0.26.post1 pypi_0 pypi
> - unsloth …
-
Hello, my situation is as follows:
I implemented qlora adapter to use with LLMs (currently bloom-560m). It works fine so far, after fine-tuning I get over 90% accuracy on my task. However, after sa…
-
From the way it is written in the paper, int4 and int8 quantizations are supported. But how do I set them?
Reading on another [issue](https://github.com/openvla/openvla/issues/10), I should set the c…
-
Currently the BitsAndBytesLinearQuant4bit for submodule always calls `bitsandbytes.functional.quantize_4bit`. This is somewhat touchy for CPU tensors because `quantize_4bit` only works on GPU tensors …
t-vi updated
2 months ago