-
The policy is given the last recurrent state from the replay buffer and isn't reset between episode boundaries. In my case I have the number of updates set to the episode length, so I've added `rollou…
bamos updated
4 years ago
-
Hi, first of all, thank you for sharing your code.
I've been trying to implement GAIL using expert demonstrations from your Google Drive. I used the hyper-parameters from gail_experts/readme and I …
-
While training, the number of frames used so far is computer as
`total_num_steps = (j + 1) * args.num_processes * args.num_steps`
Shouldn't this be multiplied by the number of stacked frames (def…
-
hi, how am i supposed to save expert demo in ppo main?
-
**Description**:
The RL and IRL algorithms need tuning to perform well (especially the Adversarial ones). We need to put some time and tune them and see if they can perform well if we want to use the…
Erfi updated
11 months ago
-
Updates from:
- https://github.com/jacobhilton/deep_learning_curriculum (focus on transformers)
- Raschka book
1. Math prerequisites
Taking a derivative to find a point of minimum or maxim…
-
I noticed across many of the implementations of actor-critic policies, the Rollout/Buffer/Trajectories object is inconsistent, in that some authors send the arrays to device as tensors during insertio…
-
Noting these down for the [neurips bbo challenge](http://bbochallenge.com/leaderboard)
- idea 1: generate more suggestions and only send the top
`n_suggestions` ranked by value.
- idea 2: gener…
-
#2
Goal:
- Extract expert trajectories from PPO/etc. / From Interaction dataset
- Build / debug system
- Testing environment
- Do tutorials
- https://bark-simulator.readthedocs.io/en/latest/a…
-
## Bug description
RewardNet `predict_processed` method only works using `state, action, next_state and done` attributes, despite trained using only `state, action`.
For example, the [BasicRewardN…