Open kalomaze opened 4 months ago
This is a known issue. It doesn't work bc the underlying peft library tries to dequantize the weights to calculate the norm and scaling after it's been sharded.
The workaround would to be to implement it completely manually unfortunately.
I have also observed that when peft_use_dora: true
, the loss curves seem identical to regular 4-bit QLoRA.
I did a test run and was able to confirm that the PEFT dora_init
method is called on the Lora adapter when peft_use_dora: true
is set and that the loss curve is practically identical to regular 8-bit lora.
Please check that this issue hasn't been reported before.
Expected Behavior
With 8-bit LoRA enabled and
peft_use_dora: true
, the training run should function (as it does when QLoRA is used.)Current behaviour
With this config:
and
DoRA training appears to fail.
When I use:
And train with
peft_use_dora: true
, the loss curves seem identical to regular 4-bit QLoRA. This leads me to the suspicion that it is still using regular qlora and that DoRA is not functioning, especially considering this blog postThis is the full error log trace:
Steps to reproduce
Config yaml