Closed zabirauf closed 6 months ago
I can reproduce this. BitsAndBytes works. But HQQ runs into memory issues even when setting context length very short (e.g. 512)
OOM seems to be happening in the model loading stage, we load and quantize pretrained weights in parallel, you can potentially manually set n_workers
to a lower number here: https://github.com/AnswerDotAI/fsdp_qlora/blob/0b57d37e7579fc5663638bcf9ba373ab7d52396c/train.py#L622 and try again.
https://github.com/AnswerDotAI/fsdp_qlora/commit/cf614264fe7b1cdbadaf35172934a69d8d31e7de - should be able to load using HQQ without OOM on a 24GB GPU. Time to load increased from 5 mins to 10 mins. Will investigate further if this was caused by recent HQQ repo changes.
Setup:
I have 1x3090 and 1x4090 and I'm trying to follow the instructions in README.md to fine tune using HQQ but running into CUDA out of memory error
Error