Closed gohsyi closed 3 months ago
Ah, yeah the CLI is for general datasets (which doesn't work for some architectures). It is run with:
python scripts/run_rm.py --model= RLHFlow/ArmoRM-Llama3-8B-v0.1
Trying to keep the CLI scripts simpler was a decision, still open to other interpretations.
Ah, yeah the CLI is for general datasets (which doesn't work for some architectures). It is run with:
python scripts/run_rm.py --model= RLHFlow/ArmoRM-Llama3-8B-v0.1
Trying to keep the CLI scripts simpler was a decision, still open to other interpretations.
Running with python scripts/run_rm.py --model= RLHFlow/ArmoRM-Llama3-8B-v0.1
, but result following error:
raise ValueError(
ValueError: You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. Please rerun your script specifying `--num_processes=` or by launching with `python {{myscript.py}}`.
@t-sifanwu I believe this is because you're using multiple GPUs for inference.
The simple implementation we used has some rough edges around this (such as this). I'm not sure at the train
comment, but let me know if you need more help.
Running
rewardbench --model=RLHFlow/ArmoRM-Llama3-8B-v0.1
results in the following error:Could you please provide the correct way to evaluate ArmoRM?