-
Issue during saving `unsloth/mistral-7b-instruct-v0.3-bnb-4bit` after training/saving, both in Kaggle and [gguf-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo)
I have tried converting…
-
Don't need `kwargs['load_in_4bit'] = True` when use `quantization_config`
https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/model/builder.py#L34-L40
-
**Describe the bug**
Query_input's shape is [batch, pos, n_heads, d_model], and the purpose of the code where the error occurred is to reshape query_input to [batch, pos, n_heads, d_head].
I found t…
-
got prompt
Loading model from /home/sam/ComfyUI/models/Molmo/molmo-7B-D-bnb-4bit
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in .
We will use 90% of…
-
Hi, I'm trying to fine-tune the Llama3.1 8b model but after fine-tuning it uploading it to HF, and when trying to run it using vLLM I get this error "KeyError: 'base_model.model.model.layers.0.mlp.dow…
-
Over on Huggingface you all have your Native bitsandbytes 4bit pre quantized models: https://huggingface.co/collections/unsloth/load-4bit-models-4x-faster-659042e3a41c3cbad582e734 It's really awesome…
q5sys updated
2 months ago
-
I had previously installed `unsloth` on an environment using `pip install unsloth`. It was working fine with inference for the below code at around 1min 10seconds. I then learnt about the new unsloth_…
-
Can the model be quantized and uploaded independently to work in Colab T4 and 12 RAM
or
used with an acceleration library and devicemap=auto
support bitsandbyts convert it to 4bit
-
os: windows
I think my environment is ready
use jupyter notebook locally
when i run these:
"from unsloth import FastLanguageModel
import torch
max_seq_length = 8192 # Choose any! We auto sup…
-
Reason of this issue in really big models, which are more than 60GB. So diffusers tries to put all of them to GPU VRAM.
Now there are couple ways to fix it.
First one is to add this line of code t…