Investigate QLoRA - Githubissues

We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. … Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. … Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA

Ultimately we want to get to a place where we can fine tune models offline. Maybe that involves QLoRA, maybe something else.

This task involves exploring what it would take to get QLoRA (or some alternative) up and running locally.

OpenAdaptAI / OpenAdapt

Investigate QLoRA #208