How to succesfully plot DRL results after code execution in stable-baselines3?

Michael-HK commented 2 years ago

Question

After successful execution of SAC DRL algorithm on my custom RL environment using stable-baselines3, I tried to used result plotter function in stable-baselines3 and the custom plot function provided in the directory (google coolab), I encountered the following error under additional context when using the custom plot function in the repository while the attached image is the incomplete plot when i used stable-baselines3 in-built plot function.

My question is how can I address this error, and successfully plot the reward graph?

Thanks!

Execution stable-baselines3 code: Hyperparameters = {'gamma': 0.9, 'learning_rate': 0.0006223187933819779, 'buffer_size': 10000, 'batch_size': 1024, 'learning_starts': 1000, 'train_freq': 256, 'tau': 0.02, 'ent_coef': 0.05,
'policy_kwargs': dict(net_arch=dict(pi=[200, 300], qf=[200, 300]))}

callback = SaveOnBestTrainingRewardCallback(check_freq=100, log_dir = log_dir) model_sac = SAC('MlpPolicy', env, verbose=1, **Hyperparameters) model_sac.learn(total_timesteps=50000, callback=callback, log_interval=100) model_sac.save("sac_ZcmesEnv2")

Plotting function from stable_baselines3.common import results_plotter Helper from the library results_plotter.plot_results([log_dir], 20000, results_plotter.X_TIMESTEPS, "SAC Algorithm training")

Using the custom plotting function provided in the repository in coolab plot_results(log_dir, title='SAC Algorithm training')

Additional context

ValueError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_4896/1740952596.py in ----> 1 plot_results(log_dir, title='SAC Algorithm training')

~\AppData\Local\Temp/ipykernel_4896/3865878996.py in plot_results(log_folder, title) 26 27 fig = plt.figure(title) ---> 28 plt.plot(x, y) 29 plt.xlabel('Episodes') 30 plt.ylabel('Rewards') .............. C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes_base.py in _plot_args(self, tup, kwargs, return_kwargs) 499 500 if x.shape[0] != y.shape[0]: --> 501 raise ValueError(f"x and y must have same first dimension, but " 502 f"have shapes {x.shape} and {y.shape}") 503 if x.ndim > 2 or y.ndim > 2:

ValueError: x and y must have same first dimension, but have shapes (3,) and (48,)

Checklist

[*] I have read the documentation (required)
[*] I have checked that there is no similar issue in the repo (required)

Miffyli commented 2 years ago

Please include full code next time for full picture. I highly recommend trying to use the zoo library to run your experiments as it contains all bells and whistles to record stats and plot results.

Otherwise, make sure to carefully study this example, which you probably saw already. You might be missing the Monitor wrapper around your env which is important :).

Please close the issue if the question is resolved and you have no bugs to report/enhancements to propose.

PS: protip for future: you can have pretty code in github comments with by placing code ``` inside this kind of thing ``` :)

araffin commented 2 years ago

Hello, as @Miffyli said, please the RL Zoo, it contains training and plotting scripts (instructions are in the Readme), the Monitor wrapper is also required when you want to do plots of the training reward.

Please fill the custom env template completely next time ;) (and check that you have multiple episodes, as the results plotter is doing a moving average on 100 episodes by default).

Related to #356 (missing plotting documentation).

DLR-RM / stable-baselines3