Closed ElleLeonne closed 1 year ago
Hi! I, unfortunately (or fortunately lol) didn't have this exact issue. It's probably because I was working with bf16 instead of fp16. A possible workaround I see (but no guarantees) here could be to multiply the values by zero instead of creating a new tensor or use some other inplace operation. You can check out this functional interace to ReLoRA for an inspiration.
https://github.com/Guitaricet/gpt-neox/blob/relora/megatron/relora/optim.py
(functional interface is still work in progress, and currently is not well-tested)
Or you can check out the dev branch of this repository (this is the code I'm actively working with) https://github.com/Guitaricet/relora/blob/15cf7b1f7e883727f1ed226dc035858accbcfd10/peft_pretraining/training_utils.py#L161
Neato, I'll take a look. Closing for now, thanks
I'm attempting to implement this into a larger modular environment (pytorch lighting).
When I attempt to reset the optimizer states like you have here:
Upon running the next loop, I get the error:
I'm sure you must've encountered this at some point yourself, so I'm curious how you managed to avoid this. Previously I had managed by just re-initializing the optimizer through accelerate, and I expected not using accelerate would fix the issue too, but it appears not to.