allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
374 stars 47 forks source link

possibly a typo in `load_bon_dataset.py` #146

Closed mickelliu closed 3 months ago

mickelliu commented 3 months ago

Line 486 applies the map function to raw_dataset instead of the "processed" unrolled_dataset, it will produce a key error since raw_dataset does not have the key "input". If I switch to unrolled_dataset the code could continue as usual.

https://github.com/allenai/reward-bench/blob/e7b62d8c61e37567c26d52fce6beb9c52b4a8f42/rewardbench/utils.py#L476-L490

# raw_dataset
Dataset({
    features: ['dataset_details', 'output', 'model_input', 'model', 'id', 'config', 'prompt', 'subset'],
    num_rows: 1770
})
# unrolled_dataset
Dataset({
    features: ['dataset_details', 'model_input', 'model', 'config', 'prompt', 'subset', 'input', 'id'],
    num_rows: 7080
})