-
hello, I get this error have no clue what I am doing wrong. Here's my code
```
for _ in range(self.K_epochs):
# Evaluating old actions and values :
…
-
When running step 3 with ZERO stage 3 enabled for both the actor and critic models,
I get the following error (line numbers may be offset due to debug statements I've added):
```
File "/path/DeepSp…
-
Thanks for your outstanding work. I would like to ask: how should we generate offline datasets, such as medium or medium-expert version, like D4RL. Also is it possible to render states into images to …
-
If I increase both the HEIGHT and WIDTH from 5 to 10 keeping the obstacles and the final goal at the same position, Deep SARSA network doesn't seem to converge. What do you think is the problem? Shoul…
-
- [ ] I have marked all applicable categories:
+ [x] exception-raising bug
+ [x] RL algorithm bug
+ [ ] documentation request (i.e. "X is missing from the documentation.")
+ [ ] ne…
-
Hi,
Here are two part loss in actor agent : adv loss and entropy loss, can you tell me why you add the entropy loss? I know the entropy weight decreased from 1 to 0.0001, but I do not know why yo…
-
Noting these down for the [neurips bbo challenge](http://bbochallenge.com/leaderboard)
- idea 1: generate more suggestions and only send the top
`n_suggestions` ranked by value.
- idea 2: gener…
-
self.Critic_return, self.advantage = trfl.sequence_advantage_critic_loss(self.baseline_,
self.reward_, self.discount_, self.bootstrap_, lambda_=lambda_,
…
-
I want to ask one more thing about the estimation of discounted reward. The variable discounted reward always starts with zero. However, if the episode is not ended, should it be the value estimation …
-
Hi!
Let's bring the reinforcement learning course to all the Russian-speaking community 🌏
Would you want to translate? Please follow the 🤗 [TRANSLATING guide](https://github.com/huggingface/tran…