reward-modeling Search Results

609 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

CarperAI/trlx #292

Special tokens in RLHF reward model code

### 🐛 Describe the bug Hi, There is something that is slightly unclear to me in the **summarize_rlhf** code - I see that the tokenizer used everywhere is the pretrained tokenizer of `EleutherAI/gpt…

arielge updated 1 year ago
1
CarperAI/trlx #25

Learnt Reward Modelling example

Create an example showing reward modeling. This could use a synthetic reward source artificially limited, or the HHH Anthropic data (already on the Stability cluster). More ideas for tasks: https://…

cat-state updated 1 year ago
4
pyro-ppl/numpyro #1602

Estimate the free parameters of the Q learning model using M…

Hi, I recently am working on a psychological project on estimating the model parameters using numpyro MCMC inference. However, I've found no tutorials within the numpyro documentation to guide me.…

fangzefunny updated 1 year ago
1
Antoxnxpod/MenuPython #1

Hello, Antoxnxpod

Hello, Antoxnxpod! I saw you liked my repositories. Do you want to create a new project together?

FussuChalice updated 11 months ago
9
CarperAI/trlx #311

PPO Summarizaion issue

### 🐛 Describe the bug 0%| | 0/10000 [00:00

l1f14bscs0388 updated 1 year ago
5
Airex/ffxiv-fightline #17

An AI approach to make an automated mitigation Plan.

This istn something i think thats in dire need, i just think it would be dope. I imagine it that u can select ur party composition and a Boss Template and the AI gives u a mitigation plane. It doe…

Shinokage107 updated 1 year ago
1
AxLabs/grantshares #19

An analysis of the two tokens NEO economy (Milestone 1)

# Abstract The entire project is composed of two milestones (milestone 1 and 2) and will take place in 12 months. It will consist of a scientific paper called: “An analysis of the two tokens NEO ec…

grantshares-dapp[bot] updated 1 year ago
42
openmodelingfoundation/openmodelingfoundation.github.io #200

Notes from Group 4 Standards Working Group and processes

gregtucker updated 1 year ago
1
microsoft/pylance-release #4125

Extend Pylance to include "dynamic" suggestions in the autoc…

The organization I work for has a well curated metadata catalog of datasets with a queryable autocomplete service. I work on a team that supports our Machine Learning teams and a common feature re…

alexlatchford updated 1 year ago
2
CarperAI/trlx #146

How to implement a conditional reward?

I want my reward function to depend on the prompt used. Mainly, I want to fine-tune an LM for a conditional generation task e.g., summarization. It seems that the reward function expects only a list o…

mukhal updated 1 year ago
5

上一页 1...44 45 46 47 48 49 50...61 下一页

609 results for reward-modeling

609 results
for reward-modeling