reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

OpenBMB/Eurus #10

(1) The Eurus-RM-7b cannot predict the score correctly. (2) …

The Eurus-RM-7b cannot predict the score correctly. 1. I run: ``` from transformers import AutoTokenizer, AutoModel import torch def test(model_path): dataset = [ # cases in webgpt; we …

liuqi8827 updated 1 month ago
2
opentensor/validators #82

Optimize GPU usage in reward models

Some of the validators are getting CUDA OOM every now and then (including the test validator). https://wandb.ai/opentensor-dev/openvalidators/runs/7p6prmo1/logs?workspace=user-opentensor-pedro …

p-ferreira updated 12 months ago
2
kibitzing/awesome-llm-data #2

LLaMa 2 Fine-tuning data

### SFT data 1. Started the SFT stage with publicly available instruction tuning data ([Chung et al., 2022](https://arxiv.org/pdf/2210.11416)) 2. Fewer but high quality > Millions of data but low …

kibitzing updated 4 weeks ago
1
ali-vilab/VGen #126

Missing reward.reward_webvid file

Hello, I followed the steps outlined in "InstructVideo (CVPR 2024)." I'm trying to run the evaluation step: bash configs/instructvideo/eval_generate_videos.sh but I encounter the error below. I checke…

Benjamin-So updated 3 weeks ago
1
rohanpsingh/LearningHumanoidWalking #35

bug in 'jvrc_walk' env

``` Traceback (most recent call last): File "E:/3_code/4_lab/3.规划/10_具身智能/6_RA/Codeset/Mujoco/LearningHumanoidWalking-main/run_experiment.py", line 133, in run_experiment(args) File "E:/3…

formoree updated 2 days ago
2
allenai/reward-bench #150

Add New reward models

Two new reward models are available: Ray2333/GRM-llama3-8B-distill (https://huggingface.co/Ray2333/GRM-llama3-8B-distill), Ray2333/Gemma-2B-rewardmodel-baseline (https://huggingface.co/Ray2333/Gemma-2…

YangRui2015 updated 1 week ago
2
AndyCao1125/MambaDM #1

About the training time

Hi, I just follow your architecture and run the code based on https://github.com/Toshihiro-Ota/decision-mamba. But the training time is unacceptable, one epoch needs 8 hours. Do you have any suggestio…

Liuxueyi updated 1 week ago
3
haozheji/exact-optimization #4

Could you please relase the model checkponts?

Hi, Thank you very much for your work! Could you please relase your model checkponts such as SFT model and Reward model for each experimments in Huggingface?

AGTSAAA updated 3 weeks ago
1
nebuly-ai/nebuly #226

Add support for pre-trained reward models

# Description OpenAssistant has released on HF the reward models they trained on the open-source datasets. Even if they are not tailored for the user need, we could lavarege them as a starting poin…

diegofiori updated 1 year ago
2
QwenLM/Qwen2 #685

Potential use cases for Qwen-0.5B

What are some of the intended use cases for the 0.5B model. There are not a lot of other similar sized models and neither is there a lot of hype around them. Though general audience seems to love th…

Tejaswgupta updated 3 weeks ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for reward-models

1000+ results
for reward-models