Closed YangRui2015 closed 4 months ago
Two new reward models are available: Ray2333/GRM-llama3-8B-distill (https://huggingface.co/Ray2333/GRM-llama3-8B-distill), Ray2333/Gemma-2B-rewardmodel-baseline (https://huggingface.co/Ray2333/Gemma-2B-rewardmodel-baseline). They are both finetuned on opensource datasets, achieving average scores of 86.1 and 73.7 locally.
Details are on the huggingface pages.
Nice, will add these shortly with Ray2333/GRM-Gemma-2B-sftreg and Ray2333/GRM-llama3-8B-sftreg. Cool paper!
Ray2333/GRM-Gemma-2B-sftreg
Ray2333/GRM-llama3-8B-sftreg
Thank you!
Two new reward models are available: Ray2333/GRM-llama3-8B-distill (https://huggingface.co/Ray2333/GRM-llama3-8B-distill), Ray2333/Gemma-2B-rewardmodel-baseline (https://huggingface.co/Ray2333/Gemma-2B-rewardmodel-baseline). They are both finetuned on opensource datasets, achieving average scores of 86.1 and 73.7 locally.
Details are on the huggingface pages.