UCL-IFT / noisyenv

noisyenv: Simple Noisy Environment Augmentation for Reinforcement Learning
https://pypi.org/project/noisyenv/
MIT License
3 stars 0 forks source link

Does noisyenv work with discrete action spaces? #2

Closed Karlheinzniebuhr closed 1 year ago

Karlheinzniebuhr commented 1 year ago

I noticed that the paper only mentions applying the algorithms to continuous control MuJoCo OpenAI gym environments. Does noisyenv also work with discrete action spaces? In my case, I'm using a Custom Environment with two discrete action spaces.

example:

env = RandomUniformScaleReward(CryptoEnv(df=gymdf, window_size=window_size, frame_bound=training_frame_bound), 0.5, 1.5)
model = PPO('MlpPolicy', env)
model.learn(total_timesteps=1000000)
AssertionError                            Traceback (most recent call last)
[~\AppData\Local\Temp/ipykernel_34464/3771859328.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/dev/Python/Mastermind/mastermind/training/Reinforcement_learning/~/AppData/Local/Temp/ipykernel_34464/3771859328.py) in 
     38 
     39 # create new model
---> 40 model = PPO('MlpPolicy', envs, policy_kwargs=policy_kwargs, ent_coef=0.01, batch_size=256, seed=seed)
     41 # model.clip_range = exponential_decay_clip_range
     42 

[c:\ProgramData\mambaforge\lib\site-packages\stable_baselines3\ppo\ppo.py](file:///C:/ProgramData/mambaforge/lib/site-packages/stable_baselines3/ppo/ppo.py) in __init__(self, policy, env, learning_rate, n_steps, batch_size, n_epochs, gamma, gae_lambda, clip_range, clip_range_vf, normalize_advantage, ent_coef, vf_coef, max_grad_norm, use_sde, sde_sample_freq, target_kl, stats_window_size, tensorboard_log, policy_kwargs, verbose, seed, device, _init_setup_model)
    102         _init_setup_model: bool = True,
    103     ):
--> 104         super().__init__(
    105             policy,
    106             env,

[c:\ProgramData\mambaforge\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py](file:///C:/ProgramData/mambaforge/lib/site-packages/stable_baselines3/common/on_policy_algorithm.py) in __init__(self, policy, env, learning_rate, n_steps, gamma, gae_lambda, ent_coef, vf_coef, max_grad_norm, use_sde, sde_sample_freq, stats_window_size, tensorboard_log, monitor_wrapper, policy_kwargs, verbose, seed, device, _init_setup_model, supported_action_spaces)
     79         supported_action_spaces: Optional[Tuple[Type[spaces.Space], ...]] = None,
     80     ):
---> 81         super().__init__(
     82             policy=policy,
     83             env=env,

[c:\ProgramData\mambaforge\lib\site-packages\stable_baselines3\common\base_class.py](file:///C:/ProgramData/mambaforge/lib/site-packages/stable_baselines3/common/base_class.py) in __init__(self, policy, env, learning_rate, policy_kwargs, stats_window_size, tensorboard_log, verbose, device, support_multi_env, monitor_wrapper, seed, use_sde, sde_sample_freq, supported_action_spaces)
    178 
...
--> 180                 assert isinstance(self.action_space, supported_action_spaces), (
    181                     f"The algorithm only supports {supported_action_spaces} as action spaces "
    182                     f"but {self.action_space} was provided"

AssertionError: The algorithm only supports (, , , ) as action spaces but Discrete(2) was provided
raadk commented 1 year ago

Yes, it does. Please see the example below using MountainCar.

import gymnasium as gym # 0.28.1
from noisyenv.wrappers import RandomUniformScaleReward
base_env = gym.make("MountainCar-v0")  # https://gymnasium.farama.org/environments/classic_control/mountain_car/
env = RandomUniformScaleReward(env=base_env, noise_rate=0.01, low=0.9, high=1.1)
observation, info = env.reset(seed=42)
env.step(env.action_space.sample())