Closed PBerit closed 8 months ago
https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#tips-and-tricks-when-creating-a-custom-environment -> Why should I normalize the action space?
also in https://youtu.be/Ikngt0_DXJg?si=VBohQYo0nnpBT9vP&t=780
Please use the env checker and mind its warnings, and please the "custom gym env" issue template next time.
Duplicate of https://github.com/hill-a/stable-baselines/issues/473 and others
For the behavior, you probably used SAC or another offpolicy algorithm, they use uniform sampling to warm start the replay buffer.
@araffin : Thanks for your answer. Unfortunately, I have to admit that I don't see how it answers my question. The core of my question is why does stable-baselines3 print those messages "Using cpu device
Wrapping the env with a Monitor
wrapper
Wrapping the env in a DummyVecEnv." and why afterwards the algorithm uses very small actions. I am using the environment checker of both stable-baselines 3 and gymansium and I don't get a real warning but just " warnings.warn("
is why does stable-baselines3 print those messages
This is at the very beginning of training, to have some info about what is happening and if PyTorch is using the GPU, you can remove them by setting verbose=0
.
and
d I don't get a real warning but just " warnings.warn("
Could you provide a minimal example to reproduce that? that's really all you got in the output?
(we have tests to check those warnings: https://github.com/DLR-RM/stable-baselines3/blob/620e58e61f649d0f415b7796386d6fe405778026/tests/test_envs.py#L151-L166 )
@araffin : Thanks araffin for your answer. Actually, when reducing the magnitude of the action space to a box ranging from -1 to 1, the problem does not occur any more.
Still it is really strange that when having self.action_space = gym.spaces.Box(low=-1 * maximum_charging_power, high=maximum_charging_power, shape=(1,))
the magnitude of actions change just after stable-baselines 3 print those messages about the environment. Here you see an exemplary output:
2024-01-31 10:25:25.338805: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2024-01-31 10:25:25.338999: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
C:\Users\wi9632\Anaconda3\lib\site-packages\stable_baselines3\common\env_checker.py:441: UserWarning: We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) cf. https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html
warnings.warn(
index_current_time_slot_of_the_day: 1
action_battery_charging_before_adjustment: 6882.5
action_battery_charging: 0.3
index_current_time_slot_of_the_day: 2
action_battery_charging_before_adjustment: -3526.800048828125
action_battery_charging: 0.0
index_current_time_slot_of_the_day: 3
action_battery_charging_before_adjustment: 4588.2998046875
action_battery_charging: 0.3
index_current_time_slot_of_the_day: 4
action_battery_charging_before_adjustment: 4089.10009765625
action_battery_charging: 0.3
index_current_time_slot_of_the_day: 5
action_battery_charging_before_adjustment: 6322.5
action_battery_charging: 0.4
index_current_time_slot_of_the_day: 6
action_battery_charging_before_adjustment: -5229.60009765625
action_battery_charging: -0.0
index_current_time_slot_of_the_day: 7
action_battery_charging_before_adjustment: 5790.2001953125
action_battery_charging: 0.4
index_current_time_slot_of_the_day: 8
action_battery_charging_before_adjustment: 965.2000122070312
action_battery_charging: 0.4
index_current_time_slot_of_the_day: 9
action_battery_charging_before_adjustment: -6037.2998046875
action_battery_charging: -0.0
index_current_time_slot_of_the_day: 10
action_battery_charging_before_adjustment: 5256.39990234375
action_battery_charging: 0.3
index_current_time_slot_of_the_day: 11
action_battery_charging_before_adjustment: 5291.0
action_battery_charging: 0.4
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
index_current_time_slot_of_the_day: 12
action_battery_charging_before_adjustment: -0.6000000238418579
action_battery_charging: 0.0
index_current_time_slot_of_the_day: 13
action_battery_charging_before_adjustment: 0.5
action_battery_charging: 0.4
index_current_time_slot_of_the_day: 14
action_battery_charging_before_adjustment: -0.10000000149011612
action_battery_charging: -0.0
index_current_time_slot_of_the_day: 15
action_battery_charging_before_adjustment: 0.800000011920929
action_battery_charging: 0.4
index_current_time_slot_of_the_day: 16
action_battery_charging_before_adjustment: -0.5
action_battery_charging: -0.0
index_current_time_slot_of_the_day: 17
action_battery_charging_before_adjustment: 0.5
action_battery_charging: 0.3
index_current_time_slot_of_the_day: 18
action_battery_charging_before_adjustment: 1.600000023841858
action_battery_charging: 0.4
But I now see, that there is in fact a full warning regarding the action space. However, I still can't explain why the magnitudes change drastically after the output of the messages.
But I now see, that there is in fact a full warning regarding the action space. However, I still can't explain why the magnitudes change drastically after the output of the messages.
There are two different things there. The env checker, which is sampling uniformly the action space and then comes the agent training (after the messages).
❓ Question
Hi all,
I have the following code of a gymnasium environment in combination with stable-baselines 3:
The action space ranges from -10000 to 10000. I print the output during the training and in the first few iterations the maginudes of the actions exploit the range. Then suddenly I get a strange message from stable-baselines 3:
And afterwards the maginude of the action value becomes extremely small (between -2 and 2) which does not make sense for my environment. Can someone explain, why stable-baselines 3 reports these messages and then suddently changes the magnitudes of the actions variable? And of course, I would like to know how to stop stable-baselines 3 from doing that.
Checklist