abaheti95 / LoL-RL

Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
MIT License
24 stars 7 forks source link

unable to import utils #1

Open JiuhaiChen opened 11 months ago

JiuhaiChen commented 11 months ago

Hi, thanks for releasing the codebase, it's really helpful. It seems that i am unable to import utils, for example, from utils import save_in_jsonl, distinctness, load_from_picklein data_cleaning.py, save_in_jsonl, distinctness, load_from_pickle should be under utils.utils? the same problem for other file. And i am unable to save "eval_cache.pkl" in the data_cleaning.py, can you check the issue? thanks!

abaheti95 commented 11 months ago

Thank you for pointing them out. I updated the imports in the big model training files. Please let me know if there are still any other issues.

JiuhaiChen commented 11 months ago

Thanks for fixing it. There are some issues for data_cleaning.py, cannot find "eval_cache.pkl", if i skip it, when i do the training procedure, it cannot find the cleaner_train.json under folderdata/hh_train_len2/.

One more question, have you tried full model finetuning instead of lora ?

abaheti95 commented 11 months ago

Hi @JiuhaiChen , Thank you for helping me debug this. Seems like there was a lot of redundant code in the data_cleaning.py file. I removed it and uncommented the lines which do the saving of the cleaned data.

Regarding the full-finetuning: No, I haven't tried full-finetuning mainly because I wanted to use priority sampling in A-LoL which is not trivial to do with deepspeed and trainer. I will get to that at some point in the future but not right now.

Feel free to add more follow-up questions in case you still struggle to run the code.