reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ignc-research/IR-DRL #4

generate_default_config training does not work well.

hello. I read your paper and am experimenting with path optimising via UR robot. First of all, thank you for making your code available to users. However, I found that the learning is not going well w…

kbeom9307 updated 2 months ago
1
TEA-Lab/diffusion_reward #7

Pre-trained Reward Models

Hello! I was wondering if it would be possible to release the pre-trained reward models in addition to the codec and video models that are already published (https://huggingface.co/tauhuang/diffusion_…

eugeniafeng updated 1 week ago
2
MinaFoundation/Core-Grants #31

RFP: Tokenomics in Mina Report

# Ecosystem Advancement (RFP): Tokenomics in Mina Report - **Intent**: To publish a report on the tokenomics of Mina as well as potential designs of incentives and behaviors that could improve its …

kantp updated 1 month ago
5
jibai-kia/ICT2106_P1-6 #45

Implementation of IRewardService

This thread provides updates related to IRewardService

jibai-kia updated 1 year ago
1
microsoft/DeepSpeedExamples #653

ds_eval_config v.s. ds_config

when initializing reward and ref models in step 3 of deepspeed-chat, there are two kinds of deepspeed config files are used, i.e. ds_config and ds_eval_config. May I ask why we need to use two configs…

SenZHANG-GitHub updated 10 months ago
2
nicklashansen/tdmpc2 #18

Possible training speedups

In the last few days I've been playing around trying to see how fast I can get a 19M model training on a single 4090. My somewhat arbitrary goal is 1 hour, down from about 24 hours (just on `humanoid-…

josephrocca updated 5 months ago
9
EderSantana/X #4

SARSA

To implement SARSA with experience replay: - The memory module should not compute "targets" or TD error. Memory should just store state/action/reward/next state information, and provide it in batches …

mattsqerror updated 8 years ago
1
ray-project/ray #45655

[RLlib] Unable to replicate original PPO performance

### What happened + What you expected to happen I can’t seem to replicate the original [PPO](https://arxiv.org/pdf/1707.06347) algorithm's performance when using RLlib's PPO implementation. The hyp…

rajfly updated 2 months ago
1
CarperAI/trlx #581

Multi-GPU training errors with peft

### 🐛 Describe the bug When I try to use multi-gpu training with accelerate I get an error. Code: ``` import trlx from peft import LoraConfig, TaskType from trlx.data.configs import ( Mod…

AliengirlLiv updated 2 weeks ago
1
maltemosbach/cognitive-robotics-lab-2021 #2

Implement different latent dynamics models

Different architectures can be used to implement the latent dynamics model for a model-based agent. Implement the following models in the subdirectories of `models/dynamics_models`. All dynamics model…

maltemosbach updated 2 years ago
8

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for reward-models

1000+ results
for reward-models