-
Has anyone been able to get the LLaMA-2 70B model to run inference in 4-bit quantization using HuggingFace? Here are some variations of code that I've tried based on various guides:
```python3
nam…
-
**Hardware**:
CPU: Xeon® E5-2630 v2 but limited to 16GB as this is what the vast.ai instance has.
GPU: 4x A40 --> Total of 180GB
**OS**
Linux
**python**
3.10
**cuda**
12.2
**packa…
-
### System Info
Hi guys, i just fine tune alpaca (LLaMA 7B base model) with custom dataset and using trainer API. After completing the training process, I received the following error:
```python…
-
Hey,
I'm trying to use a quantized model due to memory issue.
We usually load the model like this,
```
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dty…
-
OS: Debian 9 amd64
```
/build/servo/target/release/deps/libgstreamer_player-47ed31aa97c38542.rlib(gstreamer_player-47ed31aa97c38542.gstreamer_player.1u6u9zpa-cgu.8.rcgu.o):gstreamer_player.1u6u9zp…
-
### When I run the following script
```
import torch
from accelerate import Accelerator, PartialState
from peft import LoraConfig
from tqdm import tqdm
from transformers import AutoTokenizer, …
-
本地运行int4版本时候出现报错:
(MiniCPMV) yushen@user-MS-7E06:~/ai/MiniCPM-V$ python web_demo_2.5_gy.py --device cuda
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not use…
-
python /services/srv/MiniCPM-V/web_demo_2.5.py --device cuda 【已经修改web_demo_2.5.py的模型为MiniCPM-Llama3-V-2_5-int4】
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs ar…
-
### System Info
I am running via script inside a Docker running in a Linux environment.
### Who can help?
@younesbelkada this issue is similar but not equal to #24137.
### Information
- [ ] The o…
-
**Describe the bug**
Just fine-tuned (full) the `florence-2-large-ft` model and now i can't run them.
# Command to reproduce
```txt
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir output/flore…