-
### System Info
```
bitsandbytes==0.43.1
sentencepiece==0.1.97
huggingface_hub==0.23.2
accelerate==0.30.1
tokenizers==0.19.1
transformers==4.41.1
trl==0.8.6
peft==0.11.1
datasets==2.14.6
…
-
hi,我在微调 qwen-vl-Chat-Int4的时候遇到了类似的问题。目前已经改了device_map,关闭了fp16,但是我有另外一个问题,不知道你是否遇到过呢?
```
device_map2:{'': 0}
Traceback (most recent call last):
File "/data/zjj/Qwen/Qwen-VL/finetune.py", lin…
-
52B量化版本什么时候会有呢。我用官方的https://github.com/Tele-AI/Telechat/tree/master/quant 量化会报错,用的是A10显卡
Traceback (most recent call last):
File "/*****/quant/quant.py", line 27, in
model.quantize(example…
-
### System Info
```
llama-index 0.10.61
llama-index-agent-openai 0.2.9
llama-index-cli 0.1.13
llama-index-core 0.10.61
llama-index-embeddings-huggingface 0.2.2
llama-index-indices-managed-…
-
我的训练集数据量很大,有上百万,直接读取训练会OOM,所以使用streaming模式读取数据,但是发现训练速度很慢。
发现gpu的利用率很低
cpu直接被打满了
训练参数
```
SftArguments(train_type='sft', model_type='internvl2-8b', model_revision='master', full_deter…
-
# Quantified with the Yolov5 model, the MAP@0.5 is high(around 0.47), but the detection results are outrageous and unexpected
These days I have tried to do some quantification with yolov5_nano by …
-
### Describe the issue
Hello,
I'm trying to get quantization parameters from an input tensor such as the **quantization type** _(Static Linear per tensor/ Static linear per channel/ dynamic)_ and th…
-
### Feature request
Support for DBRX Instruct model in bitsandbytes
### Motivation
DBRX Instruct is supposed to be the best open LLM model, but the 132B makes it unusable for most. I tried this
…
-
### System Info
Ubuntu
### Reproduction
model_id = "google/gemma-2b"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bflo…
-
run on Mac M3 Max 128GB
run this code
```
from transformers import AutoModel, AutoTokenizer
MAX_LENGTH = 128
model = AutoModel.from_pretrained("unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4b…