hi there, I was trying to use your gym and stable_baseline3 for RWA problem with the following code:

Environment arguments for the simulation

env_args = dict(topology=topology, seed=10, allow_rejection=True, load=load, mean_service_holding_time=25, episode_length=episode_length, num_spectrum_resources=64) env = gym.make('RWA-v0', **env_args)

here goes the arguments of the policy network to be used

policy_args = dict(net_arch=5*[128]) # we use the elu activation function

agent = PPO(MlpPolicy, env, verbose=0, tensorboard_log="./tb/PPO-RWA-v0/", policy_kwargs=policy_args, gamma=.95, learning_rate=10e-6)

a = agent.learn(total_timesteps=10_000, callback=callback)

An error was encountered as follow: Traceback (most recent call last): File "d:/optical-rl-gym-main/examples/stable_baselines3/SimpleRWA.py", line 136, in a = agent.learn(total_timesteps=10_000, callback=callback) File "C:\Users\ \AppData\Local\Programs\Python\Python37\lib\site-packages\stable_baselines3\ppo\ppo.py", line 326, in learn reset_num_timesteps=reset_num_timesteps, File "C:\Users\ \AppData\Local\Programs\Python\Python37\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 255, in learn progress_bar, File "C:\Users\ \AppData\Local\Programs\Python\Python37\lib\site-packages\stable_baselines3\common\base_class.py", line 489, in _setup_learn self._last_obs = self.env.reset() # pytype: disable=annotation-type-mismatch File "C:\Users\ \AppData\Local\Programs\Python\Python37\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py", line 64, in reset self._save_obs(env_idx, obs) File "C:\Users\ \AppData\Local\Programs\Python\Python37\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py", line 96, in _save_obs self.buf_obs[key][env_idx] = obs[key] KeyError: 'current_service'

If I modify the observation() in reset(), changing key 'service' to 'current_service', then it will further report Traceback (most recent call last): File "d:/optical-rl-gym-main/examples/stable_baselines3/SimpleRWA.py", line 136, in a = agent.learn(total_timesteps=10_000, callback=callback) File "C:\Users\ \AppData\Local\Programs\Python\Python37\lib\site-packages\stable_baselines3\ppo\ppo.py", line 326, in learn reset_num_timesteps=reset_num_timesteps, File "C:\Users\ \AppData\Local\Programs\Python\Python37\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 255, in learn progress_bar, File "C:\Users\ \AppData\Local\Programs\Python\Python37\lib\site-packages\stable_baselines3\common\base_class.py", line 489, in _setup_learn self._last_obs = self.env.reset() # pytype: disable=annotation-type-mismatch File "C:\Users\ \AppData\Local\Programs\Python\Python37\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py", line 64, in reset self._save_obs(env_idx, obs) File "C:\Users\ \AppData\Local\Programs\Python\Python37\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py", line 96, in _save_obs self.buf_obs[key][env_idx] = obs[key] TypeError: int() argument must be a string, a bytes-like object or a number, not 'Service'

Actually, if you test the env with from stable_baselines3.common.env_checker import check_env check_env(env)

It will show you the same error. Please help me with this, thanks.

carlosnatalino / optical-rl-gym

Encountered problem when using RWAEnv #11

Environment arguments for the simulation

here goes the arguments of the policy network to be used