Closed TingchenFu closed 5 months ago
Hi,
During the evaluation phase, we use prompts with desired scores as input. In line 154 of evaluation.py, we changed the 'input_ids' to 'prompt_with_score_ids'. While the process is correct, there is redundant processing. I will work to improve the readability soon.
valid_dataset = valid_dataset.remove_columns('input_ids') valid_dataset=valid_dataset.rename_column('prompt_with_score_ids', 'input_ids')
Thanks for your quick reply! Indeed the logic is correct and I just missed the details.
In ric/utils.py line 223, I guess the
sample["input_ids"] = tokenizer.encode(sample["text"])
should besample["input_ids"] = tokenizer.encode(sample["prompt"])
instead.