Anjali295 / CatBench

0 stars 0 forks source link

Writing output to file #2

Open ykl7 opened 2 hours ago

ykl7 commented 2 hours ago

https://github.com/Anjali295/CatBench/blob/0eaf7d7c71c348942fede19563f12c326745e01a/src/gemini.py#L94

Please convert this to a jsonl output file where each line is a dictionary (JSON) and can be read and parsed individually and easily

ykl7 commented 2 hours ago

Changes to make:

  1. Convert this argument to output_file
  2. In this function, for each data point, save plan_idx, goal, steps, model_input, question_type, raw_model_output, processed_model_output, gold_label into a dictionary (i.e., JSON) and write that dictionary into one line of the output_file. You can use the json library to do this.

Note that, for all data points in the dependent_* files, gold_label = True, else False.

ykl7 commented 2 hours ago

Instead of opening the file in append mode each time (see here, initialize the file pointer before the loop starts with open(file, 'w+') as fp: and pass the fp file pointer into the writing function (https://github.com/Anjali295/CatBench/blob/0eaf7d7c71c348942fede19563f12c326745e01a/src/gemini.py#L94). Then every time a new record gets sent to be written, it will be added to the previous end of the file and the file pointer will move forward.