-
hello. I read your paper and am experimenting with path optimising via UR robot. First of all, thank you for making your code available to users. However, I found that the learning is not going well w…
-
Hello! I was wondering if it would be possible to release the pre-trained reward models in addition to the codec and video models that are already published (https://huggingface.co/tauhuang/diffusion_…
-
# Ecosystem Advancement (RFP): Tokenomics in Mina Report
- **Intent**: To publish a report on the tokenomics of Mina as well as potential designs of incentives and behaviors that could improve its …
kantp updated
1 month ago
-
This thread provides updates related to IRewardService
-
when initializing reward and ref models in step 3 of deepspeed-chat, there are two kinds of deepspeed config files are used, i.e. ds_config and ds_eval_config. May I ask why we need to use two configs…
-
In the last few days I've been playing around trying to see how fast I can get a 19M model training on a single 4090. My somewhat arbitrary goal is 1 hour, down from about 24 hours (just on `humanoid-…
-
To implement SARSA with experience replay:
- The memory module should not compute "targets" or TD error. Memory should just store state/action/reward/next state information, and provide it in batches …
-
### What happened + What you expected to happen
I can’t seem to replicate the original [PPO](https://arxiv.org/pdf/1707.06347) algorithm's performance when using RLlib's PPO implementation. The hyp…
-
### 🐛 Describe the bug
When I try to use multi-gpu training with accelerate I get an error.
Code:
```
import trlx
from peft import LoraConfig, TaskType
from trlx.data.configs import (
Mod…
-
Different architectures can be used to implement the latent dynamics model for a model-based agent. Implement the following models in the subdirectories of `models/dynamics_models`. All dynamics model…