-
The training process is quite slow, whereas using 8-bit hqq speeds it up by more than tenfold. Is this normal? Or have I missed any code?
```python
import torch
from transformers import EetqConfi…
-
Is it possible to do the fine tuning quantizing the models and using qlora?
-
(Q)DoRA, an alternative to (Q)LoRA is quickly proving to be a superior technique in terms of closing the gap between FFT and PEFT.
Known existing implementations:
- https://github.com/huggingface/…
-
```
oading checkpoint shards: 0%| | 0/2 [00:00
-
Hello,
I have a question regarding GPU memory consumption during inference.
Before finetuning a model with QLora, the torchtune.LoRALinear modules will convert the original LLM weights to nf4, a…
-
For workloads such as QLoRA, we can save and upload (or use existing ones) pre-quantized model weights, which would have a couple of benefits:
- Allow users to save disk space by only working with 4-…
-
Hi Intel team,
I met a issue when I ran the script "qlora_finetune_llama2_70b_pvc_1550_4_card.sh" and deepspeed parameters are used.
When running the code, errors occur whenever a checkpoint step …
-
I cannot train Qwen2 7B on a 4090 GPU as it would result in out-of-memory (OOM) errors due to the loading of the embedding layer. This process is anticipated to demand over 27GB of VRAM, exceeding the…
-
Changing only 1 line the config file, that is
```bash
quantize: bnb.nf4
```
Increased the memory usage from 14 GB -> 18 GB.
```
Epoch 5 | iter 965 step 965 | loss train: 1.182, val: 1.0…
rasbt updated
3 months ago
-
**base-model: Weyaxi/Dolphin2.1-OpenOrca-7B**
**Scenario:**
- followed the following guidelines - https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA…