abaheti95 / LoL-RL

Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
MIT License
26 stars 7 forks source link

unable to import utils #1

Open JiuhaiChen opened 1 year ago

JiuhaiChen commented 1 year ago

Hi, thanks for releasing the codebase, it's really helpful. It seems that i am unable to import utils, for example, from utils import save_in_jsonl, distinctness, load_from_picklein data_cleaning.py, save_in_jsonl, distinctness, load_from_pickle should be under utils.utils? the same problem for other file. And i am unable to save "eval_cache.pkl" in the data_cleaning.py, can you check the issue? thanks!

abaheti95 commented 1 year ago

Thank you for pointing them out. I updated the imports in the big model training files. Please let me know if there are still any other issues.

JiuhaiChen commented 1 year ago

Thanks for fixing it. There are some issues for data_cleaning.py, cannot find "eval_cache.pkl", if i skip it, when i do the training procedure, it cannot find the cleaner_train.json under folderdata/hh_train_len2/.

One more question, have you tried full model finetuning instead of lora ?

abaheti95 commented 1 year ago

Hi @JiuhaiChen , Thank you for helping me debug this. Seems like there was a lot of redundant code in the data_cleaning.py file. I removed it and uncommented the lines which do the saving of the cleaned data.

Regarding the full-finetuning: No, I haven't tried full-finetuning mainly because I wanted to use priority sampling in A-LoL which is not trivial to do with deepspeed and trainer. I will get to that at some point in the future but not right now.

Feel free to add more follow-up questions in case you still struggle to run the code.