Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
https://lightning.ai
Apache License 2.0
6.85k stars 726 forks source link

Add resume for adapter_v2, enable continued finetuning for adapter #1354

Open altria-zewei-wang opened 3 weeks ago

altria-zewei-wang commented 3 weeks ago

Hi all! I was checking #238 to add a function to finish resume finetuning for adapter. It would search the largest number of step point it saved from out_dir and update the state_dict. Current Problem: I updated the step_count but find out to keep the iteration count from last time would have to read in the metrics in the log folder. The problem is that I don't know how to retrieve the corresponding version in the log file without adding an input of the version of the metrics.csv (currently not implemented). Let me know what you think! Thanks for your repo!

rasbt commented 3 weeks ago

Thanks for looking into this. Sorry, I haven't spent much time on thinking through the ramifications here. But would the simple resuming from the full finetuning code not work in your case:

https://github.com/Lightning-AI/litgpt/blob/main/litgpt/finetune/full.py#L43

altria-zewei-wang commented 3 weeks ago

I was specifically looking into testing finetuning with adapter and loras in my paper, and that my gpu cuts off after certain time limit. I figure adding this feature can help anyone who is in similar situation as me.