allenai / open-instruct

Apache License 2.0
1.08k stars 140 forks source link

WizardLM Data Gone (prep data script error) #153

Closed natolambert closed 2 days ago

natolambert commented 2 months ago

MSFT removed it, so prepare_train_data.sh now errors at the end. Should probably document / phase this out.

Simple trace:

Traceback (most recent call last):
  File "/home/nathanl/open-instruct/open_instruct/reformat_datasets.py", line 789, in <module>
    globals()[f"convert_{dataset}_data"](os.path.join(args.raw_data_dir, dataset), os.path.join(args.output_dir, dataset))
  File "/home/nathanl/open-instruct/open_instruct/reformat_datasets.py", line 496, in convert_wizardlm_data
    with open(os.path.join(data_dir, "WizardLM_evol_instruct_V2_143k.json"), "r") as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'data/raw_train/wizardlm/WizardLM_evol_instruct_V2_143k.json'