Open G-JWLee opened 1 year ago
Hi, thanks!
I wonder get_lora_params() would load parameters to optimizer, but if the model itself can compute gradient, wouldn't the model still compute gradient? Would be freezing the model enough for using minlora without the get_lora_params?
Probably yes, but you need to make sure you don't accidentally freeze the lora parameters.
Also, when merging lora to the model to have another lora module, should I have to set lora_A and lora_B requires_grad=False before merging?
Probably not. After merging, lora_A and lora_B will no longer exist.
Thank you for your kind reply.
However, in the example in https://github.com/cccntu/LoRAnanoGPT/blob/master/train.py, line 236, it uses DDP without 'find_unused_parameters=True' argument. When I work on my own experiment on other setting with DDP, since backbone model has requires_grad=False, I get error message since backbone model parameters are not used for gradient computation when not specifying 'find_unused_parameters=True'. Is there something that I missed? I believe this API works with DDP.
Thnak you!
Honestly I don't know. Can you solve it by simply adding 'find_unused_parameters=True'?
I've only used it on one GPU.
Or does using get_lora_parameter solve this issue?
It looks like this method is correct in the sense that it only updates the parameters you pass in to the optimizer, but Torch will still compute gradients for all weights, as requires_grad
is still True
, according to this thread:
Hi, thank you for your great work.
I want to use yours for my experiment.
I wonder get_lora_params() would load parameters to optimizer, but if the model itself can compute gradient, wouldn't the model still compute gradient?
Would be freezing the model enough for using minlora without the get_lora_params?
Also, when merging lora to the model to have another lora module, should I have to set lora_A and lora_B requires_grad=False before merging?
Thank you.