Open natolambert opened 2 months ago
Hi @natolambert, I recently trained a generative RM (prometheus-eval/prometheus-RM-Llama-8B-v1.0) based on the CLoud code base and l think the inference using the huggingface transformers can be easily integrated to existing code of reward-bench. Can I try working on this?
See @zankner's repo https://github.com/zankner/CLoud, RM's that think out loud!