Vahe1994 / SpQR

Apache License 2.0
525 stars 42 forks source link

Datautils upd 2 #29

Closed poedator closed 1 year ago

poedator commented 1 year ago

updated, cleaner version of https://github.com/Vahe1994/SpQR/pull/20

separate loading of train and eval data using param:eval_mode. combining args.dataset and args.custom_data_path into one option loading pajama and refinedweb using dataset option (since they both are included into this repo in a fixed location) tests done in notebook

next step - separate train and eval loading to save time by avoiding lengthy train loading in eval stage.