reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/NeMo-Aligner #137

cannot load reward model from SFT model because of missing k…

I converted a llama model to nemo, with model dirs like below: ![image](https://github.com/NVIDIA/NeMo-Aligner/assets/6756880/2d36915a-a0ab-4c1a-8d20-0960a7948bdc) When I tried to load it to train a…

DZ9 updated 3 months ago
8
nebuly-ai/nebuly #241

[Chatllama] it seems you don't support flan_t5_xl to gener…

you claim"If you prefer avoiding external paid APIs, we suggest using HuggingFace’s models (e.g. flan_t5_xl) as described in more detail in the [Supported models](https://github.com/nebuly-ai/nebull…

lonelydancer updated 1 year ago
3
nicklashansen/tdmpc2 #18

Possible training speedups

In the last few days I've been playing around trying to see how fast I can get a 19M model training on a single 4090. My somewhat arbitrary goal is 1 hour, down from about 24 hours (just on `humanoid-…

josephrocca updated 4 months ago
9
vineetsk1/Switch-Strategies #30

[Strategy Market][Switch] Energy Model Agnostic

Original Author: jandersonlee Original Link: https://getsatisfaction.com/eternagame/topics/-strategy-market-switch-energy-model-agnostic Reward designs that fold similarly in both energy models in ea…

vineetsk1 updated 9 years ago
1
melodyguan/enas #71

Question on the Deriving Architectures section of section 2.…

It says: "We first sample several models from the trained policy π(m, θ). For each sampled model, we compute its reward on a single minibatch sampled from the validation set. We then take only t…

philtomson updated 5 years ago
1
microsoft/DeepSpeed #3232

[BUG] trainning [ERROR] [launch.py:434:sigkill_handler] exi…

[2023-04-14 13:11:27,879] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 13266 [2023-04-14 13:11:27,885] [ERROR] [launch.py:434:sigkill_handler] ['/usr/bin/python3', '-u', 'main.py', '--lo…

le153234 updated 1 week ago
15
sintel-dev/Orion #551

Suggestions for new evaluation metrics.

### Description According to "Anomaly scoring is based on overlapping segments: a true positive (TP) if a known anomalous window overlaps any detected windows, a false negative (FN) if a known anomal…

richard-tang199 updated 1 month ago
1
FrancisLeon/Reinforement-Learning- #3

RL book

# 1.3 Elements of Reinforcement Learning - *Policy* - A policy defines the learning agent’s way of behaving at a given time. - Roughly speaking, a policy is a mapping from perceived states of…

FrancisLeon updated 7 years ago
5
CarperAI/trlx #383

Flant-t5-large Deepspeed OVERFLOW! issues + bad outputs aft…

### 🐛 Describe the bug Hi, I'm trying to use `ilql` training on custom data with `flan-t5-large` and `flan-t5-xl` models to fine-tune them using RLHF and `gpt-j-6B` as a reward model. 1. I have …

chainyo updated 1 year ago
8
nebuly-ai/nebuly #224

[Chatllama] Use upvotes in Stanford dataset as a measure for…

# Description Currently we are supporting the following datasets: - [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP) - [Anthropic RLHF](https://huggingf…

diegofiori updated 1 year ago
8

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for reward-models

1000+ results
for reward-models