reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

RLHFlow/RLHF-Reward-Modeling #29

preference dataset 404 not found

> We preprocess many open-source preference datasets into the standard format and upload them to the hugginface hub. You can find them [HERE](https://huggingface.co/collections/RLHFlow/standard-format…

wty500 updated 1 week ago
2
microsoft/DeepSpeedExamples #417

Cannot load the previous model weights when using ZeRO 3 opt…

**Problem:** When I got a previously-trained model state dict file, e.g., a reward model named `PATH/pytorch_model.bin`. When I try to reload it for further training using ZeRO3 optimizer, an error…

caoyu-noob updated 3 months ago
4
microsoft/DeepSpeed #3244

[BUG]I trained 1.3b model on V100 and reported an error

python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_node/ python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --depl…

zx19941234 updated 1 year ago
9
Kismuz/btgym #85

Rewards and Auxiliary Tasks Discussion

In many of the RL research fields 'Hard Exploration' is a big problem as the agent need to make many steps before it sees a reward, which in term cripple the ability to learn in an efficient way. One …

JaCoderX updated 5 years ago
7
Bellman-devs/bellman #1

add a development plan

link to in contribution guidelines, what is in scope etc

hstojic updated 3 years ago
1
oguzserbetci/rl-teacher-atari #1

Switch out the reward function with a bayesian network

- [x] Figure out how (or if) they sample variance of ensemble of networks.

oguzserbetci updated 5 years ago
1
microsoft/DeepSpeedExamples #586

During the training of Step 3, the reward score of my langua…

`epoch: 0|step: 259|ppo_ep: 1|act_loss: 0.0253753662109375|cri_loss: 0.2144775390625|unsuper_loss: 0.0 average reward score: 0.20556640625 -----------------------------------------------------------…

scarydemon2 updated 8 months ago
7
kevslinger/DTQN #10

Replacing `done` with `truncated` and `terminated`

Hey Kevin, I hope you are doing well. I noticed a small bug where the step function returns only `obs, reward, done, info` instead of the `obs, reward, terminated, truncated, info`. I came across th…

ashok-arora updated 1 week ago
8
yg211/summary-reward-no-reference #1

How to interpret the values

Thank you for this great contribution, I'm sure it will help developing RL summarization systems. One thing I don't understand is how to interpret the values return from the rewarder. I'd assume t…

MichiOnGithub updated 3 years ago
2
jibai-kia/ICT2106_P1-6 #38

Week 8 Feb 21 Retrospection

‎

loopedfrog4 updated 1 year ago
2

上一页 1...10 11 12 13 14 15 16...100 下一页

1000+ results for reward-models

1000+ results
for reward-models