-
Hi.
Is there any way to evaluate a model trained with the rnad algorithm against a random agent? (like tic_tac_toe_dqn_vs_tabular.py for example)
In tic_tac_toe_dqn_vs_tabular.py, the action is take…
-
The default Adam optimizer has a `fused` flag, which, according to the docs, is significantly faster than the default when used on CUDA. Using it with PPO generates an exception, which complains that …
-
(1).
- When tested the code on SCPO methods for Goal_Point_8Hazards, and Goal_Point_8Pillars tasks, only "hazard" task showed convergence of cost performance, not "pillar" related tasks. (see red cos…
-
**Is your feature request related to a problem? Please describe.**
I notice that currently if we need to load a trained model, we need to first instantiate an agent. Take `PPO` as an example, we need…
-
### 🚀 Descirbe the improvement or the new tutorial
For historical reasons, TorchRL privately hosts a bunch of tutorials.
We'd like to bring the most significant ones to pytorch tutorials for more vi…
-
Hi, I have some problems and hope your answer.
1.how to understand gold? Such as gold distributions、gold trajectory……etc. Does it mean oracle?And how did you get these data?
2.When I run finetune_en…
-
Stable Baselines 3 has natively integrated hyperparameter tuning via https://github.com/DLR-RM/rl-baselines3-zoo. Generally in reinforcement learning research, trying hyperparameter tuning is almost r…
-
### Description
When I use `jit` and `vmap` on a function with `concatenate` and `dot` as below:
```python
def f(a: jax.Array, c1: jax.Array, c2: jax.Array) -> jax.Array:
'''A common opera…
-
### Required prerequisites
- [X] I have read the documentation .
- [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…
-
### What happened + What you expected to happen
Both the [overview of algorithms](https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#) and the [README.md of dreamerv3](https://github.com/ray…