reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

AminHP/gym-anytrading #90

Balance, order volume is not explained in the docs. Reward …

Hi @AminHP @bionicles @super-pirata , could you please update the docs to include an explanation of the agent's asset pool and how the volume of an order is determined? An example on how to change th…

astrologos updated 1 year ago
3
microsoft/DeepSpeedExamples #529

Much more memory used in step 3 when using multi gpus compar…

**System Info:** Memory: 500G GPU: 8 * A100 80G Question: **Why using multi gpus in init of DeepSpeedRLHFEngine used much more memroy compared to using single gpu ?** **Reproduce:** Copy mode…

cokuehuang updated 6 months ago
5
takuseno/d3rlpy #95

Question regarding the MOPO algorithm

Hi @takuseno, First of all, thanks for the great work. I've a question regarding the MOPO algorithm, specifically about the ProbabilisticEnsembleDynamics. In the original [paper](https://arxiv.…

ssimonc updated 3 years ago
1
hill-a/stable-baselines #309

[Feature Proposal] Intrinsic Reward VecEnvWrapper

Recent approaches have proposed to enhance exploration using an intrinsic reward. Among the techniques: - [Intrinsic Curiosity Module](https://arxiv.org/abs/1705.05363): uses the loss of a forward m…

araffin updated 3 years ago
17
tensorflow/tensorflow #44563

Error using TensorBoard callback in graph mode in TF 2.4

**System information** - Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes - OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 - …

drasmuss updated 4 days ago
14
openai/baselines #1171

Reward is much lower when using "--play"

I'm training models on Mujoco environments with the PPO2 algorithm on the tf2 branch of the project. During training, reward is slowly getting higher as expected. What is not expected is, when trainin…

Llermy updated 2 years ago
2
OpenRLHF/OpenRLHF #428

How to load a open-sourced model without 'value head'

Hi, I am trying to load a reward model (the final layer is reward_head) without 'value head', how to achieve that? In addition, I am not clear with https://github.com/OpenRLHF/OpenRLHF/blob/main/o…

kleinzcy updated 6 days ago
3
moves-rwth/storm #498

symbolic bisimulation changes results

Hello everyone, i am currently working on branch 1.8.0 (due to compatibility with stormpy) and trying to solve timed reachability properties for markov automata. I have encountered a difference …

temunds updated 6 months ago
4
normandipalo/learn-to-walk-with-genetic-algs #1

Awesome and simple idea. However, when running code reward s…

Hey Norman. Thanks for the awesome blog post ! The idea of representing control with Fourier series is neat. However when running the code I couldn't get beyond reward of ~1. I followed your sug…

mansimov updated 6 years ago
1
isk03276/LearnToMoveUR3 #3

QMutex: destroying locked mutex

"Hello, when I use the command to run the pre-trained model you provided, the following issue occurs. Could you please tell me the reason for this? python main.py --env-id reach --load-from pretrai…

deavn2236 updated 8 months ago
1

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for reward-models

1000+ results
for reward-models