Closed Lantianyou closed 3 weeks ago
Thanks for your work @aredden
I think this would be awesome, I could work on it, though main issue is that I would need to figure whether merging a lora and then unmerging it would effect the original weights. I will look into it since that would be nice to have as an option.
Thank you for your reply. I will also try to implement it, although I am not an expert in this area
I did some google search, I think PEFT claim they can merge and unmerge lora, but no details explained:
To unload the lora, I tried to load same lora with scale=-1, but run out of CUDA memory on 4090 24G
I guess to unmerge the lora cleanly, you have to save the original weights first somewhere, but it would introduce performance overhead.
Yeah- that's the problem- you wouldn't want to keep the lora weights in memory- you would want to fuse them into the weights, but if you fuse them into the weights- it could result in degrading the original weights after many weight fuses and unfuses.
True and true
So I implemented it but it's not ready for a push- seems to work well though! Includes loading and unloading, and added a web endpoint for it.
Would you mind pushing the code to a different branch, so I can test it?
Alright I pushed to 'removable-lora' https://github.com/aredden/flux-fp8-api/tree/removable-lora - you can test it if you want- though it's currently not in the webapi, would have to test it via a script @Lantianyou
Thank you a lot, will get back you the results
Alright I pushed to 'removable-lora' https://github.com/aredden/flux-fp8-api/tree/removable-lora - you can test it if you want- though it's currently not in the webapi, would have to test it via a script @Lantianyou
I tested this branch and found that when uninstalling lora on a single card 4090, OOM would occur.
@aredden I can successfully uninstall Lora immediately after loading it, but if I uninstall it after performing an inference, OOM will occur.
Ah- I guess it might need some work with cleaning up the loras after unloading / unloading. I will work on this, thanks @81549361
Ah- I guess it might need some work with cleaning up the loras after unloading / unloading. I will work on this, thanks @81549361
Thank you very much, your repo is awesome!
Alright I merged it into the main branch
Currently, seems lora is loaded ahead API server is up. Is there a way to load lora on request, and after this request finished, just unload the lora.