reward-models Search Results

Dahoas/QDSyntheticData #73

SALMON: Self-Alignment with Principle-Following Reward Model…

Dahoas updated 2 months ago

pytorch/torchtune #812

[RFC] Proximal Policy Optimisation

# Implementing Proximal Policy Optimisation I've used some of the [PyTorch RFC](https://github.com/pytorch/rfcs/blob/master/README.md) template here for clarity. **Authors:** * @salmanmohammadi…

SalmanMohammadi updated 1 week ago

nilscrm/stackelberg-ml #30

Investigate sample efficiency.

By training on hypothetical world models, it could be that we need less data from the original environment . Does our algorithm actually need less samples than a typical RL on the real world model? Us…

Angramme updated 1 week ago

NVIDIA/NeMo-Aligner #115

Support converting HF reward models to .nemo

**Is your feature request related to a problem? Please describe.** Converting a HF reward model to .nemo doesn't seem to work right now. See discussion in #109 for details. **Describe the soluti…

odelalleau updated 4 months ago

unslothai/unsloth #263

Can we use unsloth to train Reward Models?

More of a question than a bug - will you be working on some examples to use unsloth for training Reward Models - https://huggingface.co/docs/trl/main/en/reward_trainer - as well?

armsp updated 3 months ago

allenai/reward-bench #143

New Gemma-7b DPO Model

Hi, thanks for your impactful work. :) Recently, my coauthors and I submitted a paper, and we found that our model, Gemma-MMPO, shows state-of-the-art results among 7B DPO models (first place when …

ajseo95 updated 9 minutes ago

OpenBMB/Eurus #10

(1) The Eurus-RM-7b cannot predict the score correctly. (2) …

The Eurus-RM-7b cannot predict the score correctly. 1. I run: ``` from transformers import AutoTokenizer, AutoModel import torch def test(model_path): dataset = [ # cases in webgpt; we …

liuqi8827 updated 1 month ago

allenai/reward-bench #150

Add New reward models

Two new reward models are available: Ray2333/GRM-llama3-8B-distill (https://huggingface.co/Ray2333/GRM-llama3-8B-distill), Ray2333/Gemma-2B-rewardmodel-baseline (https://huggingface.co/Ray2333/Gemma-2…

YangRui2015 updated 1 day ago

ali-vilab/VGen #126

Missing reward.reward_webvid file

Hello, I followed the steps outlined in "InstructVideo (CVPR 2024)." I'm trying to run the evaluation step: bash configs/instructvideo/eval_generate_videos.sh but I encounter the error below. I checke…

Benjamin-So updated 2 weeks ago

AndyCao1125/MambaDM #1

About the training time

Hi, I just follow your architecture and run the code based on https://github.com/Toshihiro-Ota/decision-mamba. But the training time is unacceptable, one epoch needs 8 hours. Do you have any suggestio…

Liuxueyi updated 1 day ago

1000+ results for reward-models

1000+ results
for reward-models