PWhiddy / PokemonRedExperiments

Playing Pokemon Red with Reinforcement Learning
MIT License
6.89k stars 632 forks source link

Project crashes after a few runs #122

Open Muramas opened 11 months ago

Muramas commented 11 months ago

I had this issue before but I did a fresh clone and it seems to still be happening. It can run a few cycles before it happens but this is what happens when it fails.

step: 16000 event: 8.00 level: 4.00 heal: 0.00 op_lvl: 0.00 dead: -0.40 badge: 0.00 explore: 37.08 sum: 48.68Traceback (most recent call last): File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 108, in run self._target(*self._args, self._kwargs) File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 35, in _worker observation, reward, terminated, truncated, info = env.step(data) ^^^^^^^^^^^^^^ File "H:\PokemonAI\PokemonRedExperiments\baselines\red_gym_env.py", line 224, in step self.save_and_print_info(step_limit_reached, obs_memory) File "H:\PokemonAI\PokemonRedExperiments\baselines\red_gym_env.py", line 401, in save_and_print_info plt.imsave( File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\matplotlib\pyplot.py", line 2200, in imsave return matplotlib.image.imsave(fname, arr, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\matplotlib\image.py", line 1689, in imsave image.save(fname, **pil_kwargs) File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\PIL\Image.py", line 2429, in save fp = builtins.open(filename, "w+b") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [Errno 22] Invalid argument: 'session_30e6d259\curframe_a4836dd0.jpeg' 304229 pyboy.pyboy INFO ########################### 304229 pyboy.pyboy INFO # Emulator is turning off # 304230 pyboy.pyboy INFO ########################### Traceback (most recent call last): File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\connection.py", line 311, in _recv_bytes nread, err = ov.GetOverlappedResult(True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ BrokenPipeError: [WinError 109] The pipe has been ended

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\PokemonAI\PokemonRedExperiments\baselines\run_baseline_parallel_fast.py", line 83, in model.learn(total_timesteps=(ep_length)num_cpu1000, callback=CallbackList(callbacks)) File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\ppo\ppo.py", line 308, in learn return super().learn( ^^^^^^^^^^^^^^ File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 259, in learn continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 178, in collect_rollouts new_obs, rewards, dones, infos = env.step(clipped_actions) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 197, in step return self.step_wait() ^^^^^^^^^^^^^^^^ File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\vec_env\vec_transpose.py", line 95, in step_wait observations, rewards, dones, infos = self.venv.step_wait() ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 130, in step_wait results = [remote.recv() for remote in self.remotes] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 130, in results = [remote.recv() for remote in self.remotes] ^^^^^^^^^^^^^ File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "C:\Users\\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\connection.py", line 320, in _recv_bytes raise EOFError EOFError wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.

rousks commented 11 months ago

https://github.com/PWhiddy/PokemonRedExperiments/commit/37a6e8e4af16a89a21d0a4b37330d34412f8fba9

That's a temp fix, I think puffertank version fixed this. I'll need to see how they did it.

Also may want to fix typo on error messages xD