Closed oldwhitenotblack closed 9 months ago
Hey,
just flagging in lots of circumstances I have had similar issues with custom envs when I was starting over. Now, I almost always avoid said issues by ensuring my custom envs pass a check_env
process (from stable_baselines3.common.env_checker import check_env).
Ofc, happy to help you further if this does not solve your issue :) You might as well give a little bit more detail on your bug (sharing a link to the repo where you defined your custom env would be great, otherwise happy if you could share some more info on the environment right here)
If code there is, it is minimal and working
Closing because the minimum requirements for seeking help are not met.
🐛 Bug
When training SAC/PPO models with a custom environment, I consistently observe the model predicting the same action. Even when I set a large penalty in the environment to discourage repeated selection of the same action, the issue persists.
Code example
log_dir="new_sac_log" new_log=configure(log_dir,['stdout','csv','log'])
env = Monitor(env, log_dir) model = SAC("MultiInputPolicy",env,tensorboard_log=log_dir,verbose=1) model.learn(total_timesteps=10000, log_interval=2,tb_log_name="sac_10000_positiveIMP",progress_bar =True) model.save("sac_pendulum")
del model # remove to demonstrate saving and loading
model = SAC.load("sac_pendulum")
obs, info = env.reset()
while True: action, _states = model.predict(obs, deterministic=True) file_a = open("action_list.txt", "a") file_a.write(str(action)) file_a.close() obs, reward, terminated, truncated, info = env.step(action) if terminated or truncated: break
Relevant log output / Error message
System Info
Checklist