-
The Eurus-RM-7b cannot predict the score correctly.
1. I run:
```
from transformers import AutoTokenizer, AutoModel
import torch
def test(model_path):
dataset = [ # cases in webgpt; we …
-
Some of the validators are getting CUDA OOM every now and then (including the test validator).
https://wandb.ai/opentensor-dev/openvalidators/runs/7p6prmo1/logs?workspace=user-opentensor-pedro
…
-
### SFT data
1. Started the SFT stage with publicly available instruction tuning data ([Chung et al., 2022](https://arxiv.org/pdf/2210.11416))
2. Fewer but high quality > Millions of data but low …
-
Hello, I followed the steps outlined in "InstructVideo (CVPR 2024)." I'm trying to run the evaluation step: bash configs/instructvideo/eval_generate_videos.sh but I encounter the error below. I checke…
-
```
Traceback (most recent call last):
File "E:/3_code/4_lab/3.规划/10_具身智能/6_RA/Codeset/Mujoco/LearningHumanoidWalking-main/run_experiment.py", line 133, in
run_experiment(args)
File "E:/3…
-
Two new reward models are available: Ray2333/GRM-llama3-8B-distill (https://huggingface.co/Ray2333/GRM-llama3-8B-distill), Ray2333/Gemma-2B-rewardmodel-baseline (https://huggingface.co/Ray2333/Gemma-2…
-
Hi, I just follow your architecture and run the code based on https://github.com/Toshihiro-Ota/decision-mamba. But the training time is unacceptable, one epoch needs 8 hours. Do you have any suggestio…
-
Hi, Thank you very much for your work!
Could you please relase your model checkponts such as SFT model and Reward model for each experimments in Huggingface?
-
# Description
OpenAssistant has released on HF the reward models they trained on the open-source datasets. Even if they are not tailored for the user need, we could lavarege them as a starting poin…
-
What are some of the intended use cases for the 0.5B model.
There are not a lot of other similar sized models and neither is there a lot of hype around them. Though general audience seems to love th…