Open lss11005 opened 2 months ago
I suspect once Nemo models are in Transformers, it will be easier to create a pipeline to run RB. In the meantime I have a simple hack: https://github.com/berserkr/NeMo-Aligner/blob/main/examples/nlp/gpt/nemo_bench.py :) You run it the same way you would run inference.
After train RM(step1-step3) with steerLM,I'll get reward model(.nemo), is it as the final reward model?
Nemotron-4-340B technical report show the perfermance of reward model based on reward-bench Can you share the specific eval method by reward-bench, such as model convert step(nemo->hf) and parameter configuration during testing (chat_template, ...)
Can I replace base model to train reward model, such as Mistral-7b, which parameters should be modeified?