Stable-Baselines-Team / stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
https://sb3-contrib.readthedocs.io
MIT License
458 stars 167 forks source link

Recurrent PPO #179

Closed fede72bari closed 1 week ago

fede72bari commented 1 year ago

🐛 Bug

Running Recurrent PPO on CartPole in a background notebook in Kaggle after 6 hours the task crashed before finishing

To Reproduce

It was a simple test on cartpole environment. Here the code

# Create log dir
log_dir = "/tmp/gym13/"
os.makedirs(log_dir, exist_ok=True)

#env = gym.make("CartPole-v1")
#env._max_episode_steps = 500000

env = DummyVecEnv([lambda: gym.make("CartPole-v1")])
# Automatically normalize the input features and reward
env = VecNormalize(env, 
                   norm_obs=True, 
                   norm_reward=True) #, 
                   #clip_obs=10.)

# Logs will be saved in log_dir/monitor.csv
env = VecMonitor(env, log_dir)

total_steps = 2_000_000

# Logs will be saved in log_dir/monitor.csv
#env = Monitor(env, log_dir)

policy_kwargs = dict(activation_fn=th.nn.Mish, #ReLU,
                     net_arch=dict(pi=[64], vf=[64]))

model = RecurrentPPO("MlpLstmPolicy", 
            env, 
            verbose=0, 
            policy_kwargs=policy_kwargs,
            batch_size=128,
            learning_rate=0.0001,
            ent_coef = 0)
           # tensorboard_log="/ppo_cartpole_tensorboard/")
model.learn(total_timesteps=total_steps, progress_bar=True)

Relevant log output / Error message

18562.3s    585 Traceback (most recent call last):
18562.3s    586   File "/opt/conda/lib/python3.10/site-packages/nbclient/client.py", line 762, in _async_poll_output_msg
18562.3s    587     msg = await ensure_async(self.kc.iopub_channel.get_msg(timeout=None))
18562.3s    588   File "/opt/conda/lib/python3.10/site-packages/nbclient/util.py", line 96, in ensure_async
18562.3s    589     result = await obj
18562.3s    590   File "/opt/conda/lib/python3.10/site-packages/jupyter_client/channels.py", line 310, in get_msg
18562.3s    591     ready = await self.socket.poll(timeout)
18562.3s    592 asyncio.exceptions.CancelledError
18562.3s    593 
18562.3s    594 During handling of the above exception, another exception occurred:
18562.3s    595 
18562.3s    596 Traceback (most recent call last):
18562.3s    597   File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
18562.3s    598     return fut.result()
18562.3s    599 asyncio.exceptions.CancelledError
18562.3s    600 
18562.3s    601 The above exception was the direct cause of the following exception:
18562.3s    602 
18562.3s    603 Traceback (most recent call last):
18562.3s    604   File "/opt/conda/lib/python3.10/site-packages/nbclient/client.py", line 735, in _async_poll_for_reply
18562.3s    605     await asyncio.wait_for(task_poll_output_msg, self.iopub_timeout)
18562.3s    606   File "/opt/conda/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
18562.3s    607     raise exceptions.TimeoutError() from exc
18562.3s    608 asyncio.exceptions.TimeoutError
18562.3s    609 
18562.3s    610 During handling of the above exception, another exception occurred:
18562.3s    611 
18562.3s    612 Traceback (most recent call last):
18562.3s    613   File "<string>", line 1, in <module>
18562.3s    614   File "/opt/conda/lib/python3.10/site-packages/papermill/execute.py", line 113, in execute_notebook
18562.3s    615     nb = papermill_engines.execute_notebook_with_engine(
18562.3s    616   File "/opt/conda/lib/python3.10/site-packages/papermill/engines.py", line 49, in execute_notebook_with_engine
18562.3s    617     return self.get_engine(engine_name).execute_notebook(nb, kernel_name, **kwargs)
18562.3s    618   File "/opt/conda/lib/python3.10/site-packages/papermill/engines.py", line 367, in execute_notebook
18562.3s    619     cls.execute_managed_notebook(nb_man, kernel_name, log_output=log_output, **kwargs)
18562.3s    620   File "/opt/conda/lib/python3.10/site-packages/papermill/engines.py", line 436, in execute_managed_notebook
18562.3s    621     return PapermillNotebookClient(nb_man, **final_kwargs).execute()
18562.3s    622   File "/opt/conda/lib/python3.10/site-packages/papermill/clientwrap.py", line 45, in execute
18562.3s    623     self.papermill_execute_cells()
18562.3s    624   File "/opt/conda/lib/python3.10/site-packages/papermill/clientwrap.py", line 72, in papermill_execute_cells
18562.3s    625     self.execute_cell(cell, index)
18562.3s    626   File "/opt/conda/lib/python3.10/site-packages/nbclient/util.py", line 84, in wrapped
18562.3s    627     return just_run(coro(*args, **kwargs))
18562.3s    628   File "/opt/conda/lib/python3.10/site-packages/nbclient/util.py", line 62, in just_run
18562.3s    629     return loop.run_until_complete(coro)
18562.3s    630   File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
18562.3s    631     return future.result()
18562.3s    632   File "/opt/conda/lib/python3.10/site-packages/nbclient/client.py", line 949, in async_execute_cell
18562.3s    633     exec_reply = await self.task_poll_for_reply
18562.3s    634   File "/opt/conda/lib/python3.10/site-packages/nbclient/client.py", line 739, in _async_poll_for_reply
18562.3s    635     raise CellTimeoutError.error_from_timeout_and_cell(
18562.3s    636 nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 4 seconds.
18562.3s    637 The message was: Timeout waiting for IOPub output.
18562.3s    638 Here is a preview of the cell contents:
18562.3s    639 -------------------
18562.3s    640 ['# Create log dir', 'log_dir = "/tmp/gym13/"', 'os.makedirs(log_dir, exist_ok=True)', '', '#env = gym.make("CartPole-v1")']
18562.3s    641 ...
18562.3s    642 ['#     # VecEnv resets automatically', '#     # if done:', '#     #   obs = env.reset()', '', '# env.close()']
18562.3s    643 -------------------
18562.3s    644 
18564.5s    645 /opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py:2930: FutureWarning: --Exporter.preprocessors=["remove_papermill_header.RemovePapermillHeader"] for containers is deprecated in traitlets 5.0. You can pass `--Exporter.preprocessors item` ... multiple times to add items to a list.
18564.5s    646   warn(
18564.5s    647 [NbConvertApp] WARNING | Config option `kernel_spec_manager_class` not recognized by `NbConvertApp`.
18564.5s    648 [NbConvertApp] Converting notebook __notebook__.ipynb to notebook
18564.9s    649 [NbConvertApp] Writing 87181 bytes to __notebook__.ipynb
18566.7s    650 /opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py:2930: FutureWarning: --Exporter.preprocessors=["nbconvert.preprocessors.ExtractOutputPreprocessor"] for containers is deprecated in traitlets 5.0. You can pass `--Exporter.preprocessors item` ... multiple times to add items to a list.
18566.7s    651   warn(
18566.7s    652 [NbConvertApp] WARNING | Config option `kernel_spec_manager_class` not recognized by `NbConvertApp`.
18566.7s    653 [NbConvertApp] Converting notebook __notebook__.ipynb to html
18567.7s    654 [NbConvertApp] Writing 365517 bytes to __results__.html
18567.9s    655 

System Info

No response

Checklist

araffin commented 1 year ago

Hello, the traceback is not complete. I would suspect that the problem might come from Kaggle notebook, there is probably a timeout.

fede72bari commented 1 year ago

I copied just the pertinent part of the log, the not copied part above refers to the initial seconds when some modules were installed. In between there is nothing. The strange thing is that I had run much longer scripts in the background on Kaggle, up to 10-11 hours without any timeout.

araffin commented 1 year ago

I copied just the pertinent part of the log

the traceback doesn't tell anything about why the process was terminated and nothing might relate it to SB3, it just contains a mix of timeout and cancelled errors.

fede72bari commented 1 year ago

exactly, the only pieces of information are contextual: for other longer runs, I have not experienced a similar timeout problem. Let's see if others encounter the same problem with Kaggle background notebook and SB3 Recurrent PPO.