-
## 🐛 Bug
I am trying to optimise the `Qwen/Qwen1.5-4B-Chat` model. As I have only 8GB RAM on my MAC M1, I use 3bit quantisation and a really small prefill chunk size = 2048. I get the following err…
-
hi, thanks for your wonderful work. I was wondering if i want to infer a model with multi-gpu,what should i do? I have tried with belowing code when load model with `device_map` parameter:
```
model…
-
`
root@dsw-541920-5fd5c64bc4-m25b4:/mnt/workspace/modelscope# xtuner train llama2_7b_chat_qlora_custom_sft_e1_copy.py --deepspeed deepspeed_zero1
[2024-07-01 21:43:15,368] [INFO] [real_accelerator.p…
-
running dpo with Qwen meet flatten problem. FSDP config as follow
```yaml
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
fsdp_auto_w…
-
### System Info
bitsandbytes==0.43.1
peft==0.11.0
accelerate==0.31.0
transformers==4.38.2
trl==0.9.4
### Who can help?
@BenjaminBossan @sayakpaul
### Information
- [X] The official …
-
Hello everyone. I used this code before for LLaMA 2 7B. But now, it doesn't work with any model, even Phi 3!!pip install -q accelerate bitsandbytes peft==0.4.0 transformers==4.38.2 trl==0.4.7!pip inst…
-
### System Info
```
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4…
-
## Goal
- We should have a model folder that is able to handle different models
- Built-in models (e.g. `janhq/llama3:7b-tensorrt-llm`)
- Huggingface GGUF repos with multiple quants (e.g.…
-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing ans…
-
### 🐛 Describe the bug
Hello,
I am running llama3-70b and mixtral with VLLM on a bunch of different kinds of machines. I encountered wildly different quality performance on A10 GPUs vs A100/H…