-
Hi @AminHP @bionicles @super-pirata , could you please update the docs to include an explanation of the agent's asset pool and how the volume of an order is determined?
An example on how to change th…
-
**System Info:**
Memory: 500G
GPU: 8 * A100 80G
Question:
**Why using multi gpus in init of DeepSpeedRLHFEngine used much more memroy compared to using single gpu ?**
**Reproduce:**
Copy mode…
-
Hi @takuseno,
First of all, thanks for the great work.
I've a question regarding the MOPO algorithm, specifically about the ProbabilisticEnsembleDynamics.
In the original [paper](https://arxiv.…
-
Recent approaches have proposed to enhance exploration using an intrinsic reward.
Among the techniques:
- [Intrinsic Curiosity Module](https://arxiv.org/abs/1705.05363): uses the loss of a forward m…
-
**System information**
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
- …
-
I'm training models on Mujoco environments with the PPO2 algorithm on the tf2 branch of the project. During training, reward is slowly getting higher as expected. What is not expected is, when trainin…
-
Hi, I am trying to load a reward model (the final layer is reward_head) without 'value head', how to achieve that?
In addition, I am not clear with https://github.com/OpenRLHF/OpenRLHF/blob/main/o…
-
Hello everyone,
i am currently working on branch 1.8.0 (due to compatibility with stormpy) and trying to solve timed reachability properties for markov automata.
I have encountered a difference …
-
Hey Norman.
Thanks for the awesome blog post ! The idea of representing control with Fourier series is neat.
However when running the code I couldn't get beyond reward of ~1. I followed your sug…
-
"Hello, when I use the command to run the pre-trained model you provided, the following issue occurs. Could you please tell me the reason for this?
python main.py --env-id reach --load-from pretrai…