-
Hi,
May I ask why you have used such a small batch size?
Since you have mentioned in the paper that a larger batch size would lead to a significant speed up. Why still 32 in the standard implementat…
-
Currently only scalar actions are supported, how far up on the priority list is it to expand on this?
-
Hi,
I've been trying to use the data from rl unplugged in its native resolution (210x160). I hoping to replay the rl unplugged actions from the sequential data release for dopamine into the environ…
-
Is the distribution of the five runs for each dataset the same? Or does run 1 refer to data collected by a policy at the beginning stages of training while run 5 for the last stage of training?
-
I am using the code in atari_dqn.ipynb to train a policy for Gravitar from scratch (on 1 run = 100 shards of data), and this is what my loss log looks like so far:
```
[Learner] Loss = 0.002 | Ste…
-
I am using the following command to try and generate data for one run of Freeway:
```
python -um batch_rl.baselines.train \
--base_dir=/tmp/batch_rl_data \
--gin_files='batch_rl/baselines/co…
-
The notebook is easy to get running, kudos for that. However the results do not match the repository.
When I run it the output of "Training Loop" is:
```
[Learner] Critic Loss = 4.062 | Policy L…
-
Thanks for the great work first!
I have a bunch of data in **_(state, action, reward, next state)_** format. I try to understand how you guys parse the $store$_action_ckpt file in the code but I fail…
-
- Solaar version (`solaar --version` and `git describe --tags`):
solaar 1.1.3
- Distribution:
Arch Linux
- Kernel version (ex. `uname -srmo`):
Linux 5.17.9-arch1-1 x86_64 GNU/Linux
- Outpu…
-
I'd like to retrain the online DQN agent in order to log some additional data during online training. The README says
> This data can be generated by running the online agents using batch_rl/basel…