[Question] How to use gsde in PPO

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

https://stable-baselines3.readthedocs.io

MIT License

8.35k stars 1.6k forks source link

[Question] How to use gsde in PPO #1945

Closed CAI23sbP closed 1 week ago

CAI23sbP commented 3 weeks ago

❓ Question

@araffin There is not exist about how to use gsde in docs. Could you explain? for me about PPO, p.s.: my stable-baselines3 version is v2.0.0 use_gsde = True, full_std= True, log_std_init = -2, sde_sample_freq = 4

"""
Custom rollout_state : actually from imitation.data.rollout.py
"""
policy.reset_noise(venv.num_envs)
obs = venv.reset()
while not done:
        if use_sde and sde_sample_freq > 0 and n_steps % sde_sample_freq == 0:
            # Sample a new noise matrix
            policy.reset_noise(venv.num_envs)

        n_steps += 1

        acts, state = policy.predict(obs, deterministic_policy = True)
        obs, rews, dones, infos = venv.step(acts)

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin commented 3 weeks ago

Hello,

There is not exist about how to use gsde in docs. Could you explain?

Have you read the gSDE paper? What did you understand and what is not clear? Did you have a look at https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#ppo-policies ?

use_sde (bool) – Whether to use State Dependent Exploration or not

log_std_init (float) – Initial value for the log standard deviation

full_std (bool) – Whether to use (n_features x n_actions) parameters for the std instead of only (n_features,) when using gSDE

use_expln (bool) – Use expln() function instead of exp() to ensure a positive standard deviation (cf paper). It allows to keep variance above zero and prevent it from growing too fast. In practice, exp() is usually enough.

from https://github.com/DLR-RM/stable-baselines3/blob/4efee92fbad70f85aa094e27bd0a740274121795/stable_baselines3/common/policies.py#L427-L433

CAI23sbP commented 2 weeks ago

@araffin Sorry to late reply!. Yes, i read docs. i mean that "Is it okay to use it like this?" about example code.

araffin commented 1 week ago

Yes, i read docs. i mean that "Is it okay to use it like this?" about example code.

You mean "will it run"? yes "is it tailored for my problem?" hard to say, but at least those parameters were found to be working for other envs, you can have a look (and use) the RL Zoo for that: https://github.com/DLR-RM/rl-baselines3-zoo/blob/27e081eb24419ee843ae1c329b0482db823c9fc1/hyperparams/ppo.yml#L137

CAI23sbP commented 1 week ago

Thank you for your reply~!