artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.96k stars 820 forks source link

Notebook Code for qlora doesn't work #224

Closed nivibilla closed 1 year ago

nivibilla commented 1 year ago

None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

When enabling gradient checkpointing in the notebook trainer. It doesn't work

nivibilla commented 1 year ago

Fixed by ddp_find_unused_parameters=False

artidoro commented 1 year ago

Is the notebook broken or is it a different setting than the standard single GPU colab?

nivibilla commented 1 year ago

I'm not sure, I did it in databricks. So maybe it's a databricks issue. But I think it's because the system thinks there's multiple gpus even though there aren't. So having that flag fixes it. So it could just be a databricks issue.