allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
277 stars 27 forks source link

Add multi-gpu inference option #125

Open natolambert opened 1 month ago

natolambert commented 1 month ago

Currently run_rm.py only uses one RM because RMs are not well supported generally for inference. Current implementation is a separate run_rm_mpgu.py script. We can delete this and improve the base script if more use cases emerge.

Closes #95

natolambert commented 1 month ago

Inference works distributed, but I couldn't get the results gather working correctly with things like

    state.wait_for_everyone()

    # flatten results list of lists if is list of lists
    if state.is_main_process:
        logger.info("Gathering results")
        results = gather_object(results) # gather() is for tensors