reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

RLHFlow/Online-RLHF #8

questions about dpo

Hi，I have some questions about dpo: 1. Is there any reason why choosing Nectar dataset to train offline vanilla dpo rather than using the same dataset as iterative dpo, for a possibly more fair comp…

hong-xl updated 1 month ago
5
mobeets/q-rnn #19

Beron DA results

## Task Notes: - the ITI is uniform over five different durations. We should do this, instead of geometric ## Behavior Decision times are slightly longer following switches (left pan…

mobeets updated 4 months ago
2
ray-project/ray #45655

[RLlib] Unable to replicate original PPO performance

### What happened + What you expected to happen I can’t seem to replicate the original [PPO](https://arxiv.org/pdf/1707.06347) algorithm's performance when using RLlib's PPO implementation. The hyp…

rajfly updated 1 month ago
1
yw3576/stop-and-go #1

Errors

Thank you for sharing. But I run your Sumo models in Fig1,7 and 11 it often had errors In Fig 7 --------------------------------------------------------------------------- NameError …

TrinhTuanHung2021 updated 1 year ago
5
AllenNeuralDynamics/dynamic-foraging-task #402

Generating session metadata from the GUI

- Behavior session metadata - Which fields should go to the behavior session metadata? https://github.com/AllenNeuralDynamics/dynamic-foraging-task/issues/303#issuecomment-2062196234 - Existi…

XX-Yin updated 1 month ago
3
azerothcore/azerothcore-wotlk #17118

[Hunter Pet] Rip-Blade Ravager doesn't shuffle between it's …

https://github.com/chromiecraft/chromiecraft/issues/6087 ### What client do you play on? enUS ### Faction Alliance ### Content Phase: Generic ### Current Behaviour it seems bes…

Annamaria-CC updated 3 weeks ago
4
OptimalScale/LMFlow #862

[Roadmap] LMFlow Roadmap

This document includes the features in LMFlow's roadmap. We welcome any discuss or contribute to the specific features at related Issues/PRs. 🤗 ### Main Features * Data * [x] DPO dataset format…

wheresmyhair updated 1 week ago
1
OpenDevin/OpenDevin #2221

[Feature]: Retry on failure functionality

**What problem or use case are you trying to solve?** Sometimes models fail to do their job correctly, and we would benefit from starting all over from the beginning. There are a few examples of th…

neubig updated 1 week ago
1
facebookresearch/mbrl-lib #115

[Feature Request] More general reward

Hi, currently reward_fn is independent from environment class (mbrl.models.ModelEnv) and accepts as input actions and next observation. In practice more general, dependent on environment parameters re…

mkolodziejczyk-piap updated 2 years ago
1
hpcaitech/ColossalAI #3566

[FEATURE]: Graphic card ram friendly PPO training for big mo…

### Describe the feature The PPO training needs to maintain 4 models in memory at the same time. The original implementation keep the reward/actor critic/initial model in video ram at the same time. …

yynil updated 1 year ago
1

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for reward-models

1000+ results
for reward-models