ArmoRM Issue - Githubissues

allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

https://huggingface.co/spaces/allenai/reward-bench

Apache License 2.0

442 stars 52 forks source link

ArmoRM Issue #162

Closed gohsyi closed 3 months ago

gohsyi commented 3 months ago

Running rewardbench --model=RLHFlow/ArmoRM-Llama3-8B-v0.1 results in the following error:

Traceback (most recent call last):
  File "/miniconda/envs/reward-bench/bin/rewardbench", line 33, in <module>
    sys.exit(load_entry_point('rewardbench', 'console_scripts', 'rewardbench')())
  File "/mnt/bn/hongyi-lq/mlx/users/hongyi.guo1/repo/12252/reward-bench/rewardbench/rewardbench.py", line 142, in main
    raise NotImplementedError("Custom dialogue not implemented yet for simpler data formatting.")
NotImplementedError: Custom dialogue not implemented yet for simpler data formatting.

Could you please provide the correct way to evaluate ArmoRM?

natolambert commented 3 months ago

Ah, yeah the CLI is for general datasets (which doesn't work for some architectures). It is run with:

python scripts/run_rm.py --model= RLHFlow/ArmoRM-Llama3-8B-v0.1

Trying to keep the CLI scripts simpler was a decision, still open to other interpretations.

t-sifanwu commented 3 months ago

Ah, yeah the CLI is for general datasets (which doesn't work for some architectures). It is run with:
python scripts/run_rm.py --model= RLHFlow/ArmoRM-Llama3-8B-v0.1
Trying to keep the CLI scripts simpler was a decision, still open to other interpretations.

Running with python scripts/run_rm.py --model= RLHFlow/ArmoRM-Llama3-8B-v0.1, but result following error:

 raise ValueError(
ValueError: You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. Please rerun your script specifying `--num_processes=` or by launching with `python {{myscript.py}}`.

natolambert commented 3 months ago

@t-sifanwu I believe this is because you're using multiple GPUs for inference. The simple implementation we used has some rough edges around this (such as this). I'm not sure at the train comment, but let me know if you need more help.