LucasAlegre / sumo-rl

Reinforcement Learning environments for Traffic Signal Control with SUMO. Compatible with Gymnasium, PettingZoo, and popular RL libraries.
https://lucasalegre.github.io/sumo-rl
MIT License
730 stars 197 forks source link

trial error in experiment ppo_4x4grid #149

Open mas-kho opened 1 year ago

mas-kho commented 1 year ago

Hi . I'm trying to simulate experiment ppo_4x4grid and I had fixed many errors before but now I cant understand which are the errors here and how can I fix them. I will be so thankful if anyone can help. Also I appreciate if anyone can tell me how can I understand how much of this project remained to receive final simulation based on these codes? Here is the codes and results :

import os import sys import numpy as np import pandas as pd import ray import traci from ray import tune from ray.rllib.algorithms.ppo import PPOConfig 2023-07-28 17:51:32,779 WARNING deprecation.py:50 -- DeprecationWarning: DirectStepOptimizer has been deprecated. This will raise an error in the future! from ray.rllib.env.wrappers.pettingzoo_env import ParallelPettingZooEnv from ray.tune.registry import register_env import sumo_rl

if "SUMO_HOME" in os.environ: ... tools = os.path.join(os.environ["SUMO_HOME"], "tools") ... sys.path.append(tools) ... else: ... sys.exit("Please declare the environment variable 'SUMO_HOME'") ... if name == "main": ... ray.shutdown() ... ray.init() ... 2023-07-28 17:51:36,792 INFO worker.py:1621 -- Started a local Ray instance. RayContext(dashboard_url='', python_version='3.10.7', ray_version='2.6.1', ray_commit='d68bf04883af2e430fc3a50fd544bb7c84aff2e9', protocol_version=None) env_name="4x4grid"

register_env( ... envname, ... lambda : ParallelPettingZooEnv( ... sumo_rl.parallel_env( ... net_file="nets/4x4-Lucas/4x4.net.xml", ... route_file="nets/4x4-Lucas/4x4c1c2c1c2.rou.xml", ... out_csv_name="outputs/4x4grid/ppo", ... use_gui=False, ... num_seconds=80000, ... ) ... ), ... )

config = ( ... PPOConfig() ... .environment(env=env_name, disable_env_checking=True) ... .rollouts(num_rollout_workers=4, rollout_fragment_length=128) ... .training( ... train_batchsize=512, ... lr=2e-5, ... gamma=0.95, ... lambda=0.9, ... use_gae=True, ... clip_param=0.4, ... grad_clip=None, ... entropy_coeff=0.1, ... vf_loss_coeff=0.25, ... sgd_minibatch_size=64, ... num_sgd_iter=10, ... ) ... .debugging(log_level="ERROR") ... .framework(framework="torch") ... .resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0"))) ... ) 2023-07-28 17:51:39,196 WARNING algorithm_config.py:2534 -- Setting exploration_config={} because you set _enable_rl_module_api=True. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the forward_exploration method of the RLModule at hand. On configs that have a default exploration config, this must be done with config.exploration_config={}.

tune.run( ... "PPO", ... name="PPO", ... stop={"timesteps_total": 100000}, ... checkpoint_freq=10, ... local_dir="~/ray_results/" + env_name, ... config=config.to_dict(), ... ) 2023-07-28 17:52:58,288 INFO tune.py:666 -- [output] This will use the new output engine with verbosity 2. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see https://github.com/ray-project/ray/issues/36949 C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\tune\tune.py:258: UserWarning: Passing a local_dir is deprecated and will be removed in the future. Pass storage_path instead or set the RAY_AIR_LOCAL_CACHE_DIR environment variable instead. warnings.warn( 2023-07-28 17:52:58,361 WARNING deprecation.py:50 -- DeprecationWarning: build_tf_policy has been deprecated. This will raise an error in the future! 2023-07-28 17:52:58,374 WARNING deprecation.py:50 -- DeprecationWarning: build_policy_class has been deprecated. This will raise an error in the future! 2023-07-28 17:52:58,443 WARNING algorithm_config.py:2534 -- Setting exploration_config={} because you set _enable_rl_module_api=True. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the forward_exploration method of the RLModule at hand. On configs that have a default exploration config, this must be done with config.exploration_config={}. C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\gymnasium\spaces\box.py:130: UserWarning: WARN: Box bound precision lowered by casting to float32 gym.logger.warn(f"Box bound precision lowered by casting to {self.dtype}") C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\gymnasium\utils\passive_env_checker.py:164: UserWarning: WARN: The obs returned by the reset() method was expecting numpy array dtype to be float32, actual type: float64 logger.warn( C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\gymnasium\utils\passive_env_checker.py:188: UserWarning: WARN: The obs returned by the reset() method is not within the observation space. logger.warn(f"{pre} is not within the observation space.") 2023-07-28 17:52:58,602 WARNING algorithm_config.py:2534 -- Setting exploration_config={} because you set _enable_rl_module_api=True. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the forward_exploration method of the RLModule at hand. On configs that have a default exploration config, this must be done with config.exploration_config={}. ╭────────────────────────────────────────────────────────╮ │ Configuration for experiment PPO │ ├────────────────────────────────────────────────────────┤ │ Search algorithm BasicVariantGenerator │ │ Scheduler FIFOScheduler │ │ Number of trials 1 │ ╰────────────────────────────────────────────────────────╯

View detailed results here: C:\Users\Bahar\ray_results\4x4grid\PPO To visualize your results with TensorBoard, run: tensorboard --logdir C:\Users\Bahar\ray_results\PPO

2023-07-28 17:52:58,704 WARNING algorithm_config.py:2534 -- Setting exploration_config={} because you set _enable_rl_module_api=True. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the forward_exploration method of the RLModule at hand. On configs that have a default exploration config, this must be done with config.exploration_config={}. Trial status: 1 PENDING Current time: 2023-07-28 17:52:58. Total running time: 0s Logical resource usage: 5.0/12 CPUs, 0/1 GPUs ╭────────────────────────────────────╮ │ Trial name status │ ├────────────────────────────────────┤ │ PPO_4x4grid_406c2_00000 PENDING │ ╰────────────────────────────────────╯

(pid=18404) DeprecationWarning: DirectStepOptimizer has been deprecated. This will raise an error in the future! (PPO pid=18404) 2023-07-28 17:53:09,058 WARNING algorithm_config.py:2534 -- Setting exploration_config={} because you set _enable_rl_module_api=True. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the forward_exploration method of the RLModule at hand. On configs that have a default exploration config, this must be done with config.exploration_config={}. (PPO pid=18404) 2023-07-28 17:53:09,059 WARNING algorithm_config.py:656 -- Cannot create PPOConfig from given config_dict! Property stdout_file__ not supported. (pid=8848) DeprecationWarning: DirectStepOptimizer has been deprecated. This will raise an error in the future! (pid=9220) DeprecationWarning: DirectStepOptimizer has been deprecated. This will raise an error in the future! (RolloutWorker pid=8848) Warning: Environment variable SUMO_HOME is not set properly, disabling XML validation. Set 'auto' or 'always' for web lookups. (RolloutWorker pid=8848) Error: File 'nets/4x4-Lucas/4x4.net.xml' is not accessible (No such file or directory). (RolloutWorker pid=8848) Quitting (on error). (RolloutWorker pid=1320) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=1320, ip=127.0.0.1, actor_id=ff1b836b9bfce0a34b02e58901000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000022205D8E590>) (RolloutWorker pid=1320) File "python\ray_raylet.pyx", line 1424, in ray._raylet.execute_task (RolloutWorker pid=1320) File "python\ray_raylet.pyx", line 1364, in ray._raylet.execute_task.function_executor (RolloutWorker pid=1320) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\function_manager.py", line 726, in actor_method_executor (RolloutWorker pid=1320) return method(__ray_actor, *args, *kwargs) (RolloutWorker pid=1320) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\util\tracing\tracing_helper.py", line 464, in _resume_span (RolloutWorker pid=1320) return method(self, _args, **_kwargs) (RolloutWorker pid=1320) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 397, in init (RolloutWorker pid=1320) self.env = env_creator(copy.deepcopy(self.env_context)) (RolloutWorker pid=1320) File "", line 4, in (RolloutWorker pid=1320) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\pettingzoo\utils\conversions.py", line 17, in par_fn (RolloutWorker pid=1320) env = env_fn(kwargs) (RolloutWorker pid=1320) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\sumo_rl\environment\env.py", line 32, in env (RolloutWorker pid=1320) env = SumoEnvironmentPZ(kwargs) (RolloutWorker pid=1320) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\sumo_rl\environment\env.py", line 505, in init (RolloutWorker pid=1320) self.env = SumoEnvironment(**self._kwargs) (RolloutWorker pid=1320) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\sumo_rl\environment\env.py", line 149, in init (RolloutWorker pid=1320) traci.start([sumolib.checkBinary("sumo"), "-n", self._net], label="init_connection" + self.label) (RolloutWorker pid=1320) File "C:\Program Files (x86)\Eclipse\Sumo\tools\traci\main.py", line 147, in start (RolloutWorker pid=1320) result = init(sumoPort, numRetries, "localhost", label, sumoProcess, doSwitch, traceFile, traceGetters) (RolloutWorker pid=1320) File "C:\Program Files (x86)\Eclipse\Sumo\tools\traci\main.py", line 119, in init (RolloutWorker pid=1320) return con.getVersion() (RolloutWorker pid=1320) File "C:\Program Files (x86)\Eclipse\Sumo\tools\traci\connection.py", line 376, in getVersion (RolloutWorker pid=1320) result = self._sendCmd(command, None, None) (RolloutWorker pid=1320) File "C:\Program Files (x86)\Eclipse\Sumo\tools\traci\connection.py", line 225, in _sendCmd (RolloutWorker pid=1320) return self._sendExact() (RolloutWorker pid=1320) File "C:\Program Files (x86)\Eclipse\Sumo\tools\traci\connection.py", line 135, in _sendExact (RolloutWorker pid=1320) raise FatalTraCIError("connection closed by SUMO") (RolloutWorker pid=1320) traci.exceptions.FatalTraCIError: connection closed by SUMO (PPO pid=18404) 2023-07-28 17:53:23,332 ERROR actor_manager.py:500 -- Ray error, taking actor 1 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=11480, ip=127.0.0.1, actor_id=c6a956ce57c8ee3338a6239201000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x00000156A27CE4A0>) (PPO pid=18404) 2023-07-28 17:53:23,333 ERROR actor_manager.py:500 -- Ray error, taking actor 2 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=1320, ip=127.0.0.1, actor_id=ff1b836b9bfce0a34b02e58901000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000022205D8E590>) (PPO pid=18404) 2023-07-28 17:53:23,334 ERROR actor_manager.py:500 -- Ray error, taking actor 3 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=9220, ip=127.0.0.1, actor_id=0cda4c8b9943b8d79623286f01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000021962F4E560>) (PPO pid=18404) 2023-07-28 17:53:23,335 ERROR actor_manager.py:500 -- Ray error, taking actor 4 out of service. The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=8848, ip=127.0.0.1, actor_id=1b9f3448f380f5afe62c139f01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000001E09BBAE560>) (PPO pid=18404) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::PPO.init() (pid=18404, ip=127.0.0.1, actor_id=4669252480053ecd755caaa001000000, repr=PPO) (PPO pid=18404) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 227, in _setup (PPO pid=18404) self.add_workers( (PPO pid=18404) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 593, in add_workers (PPO pid=18404) raise result.get() (PPO pid=18404) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\utils\actor_manager.py", line 481, in __fetch_result (PPO pid=18404) result = ray.get(r) (PPO pid=18404) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\auto_init_hook.py", line 24, in auto_init_wrapper (PPO pid=18404) return fn(*args, *kwargs) (PPO pid=18404) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\client_mode_hook.py", line 103, in wrapper (PPO pid=18404) return func(args, **kwargs) (PPO pid=18404) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\worker.py", line 2495, in get (PPO pid=18404) raise value (PPO pid=18404) ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=11480, ip=127.0.0.1, actor_id=c6a956ce57c8ee3338a6239201000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x00000156A27CE4A0>) (PPO pid=18404) (PPO pid=18404) During handling of the above exception, another exception occurred: (PPO pid=18404) (PPO pid=18404) ray::PPO.init() (pid=18404, ip=127.0.0.1, actor_id=4669252480053ecd755caaa001000000, repr=PPO) (PPO pid=18404) super().init( (PPO pid=18404) self.setup(copy.deepcopy(self.config)) (PPO pid=18404) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 639, in setup (PPO pid=18404) self.workers = WorkerSet( (PPO pid=18404) raise e.args[0].args[2] 2023-07-28 17:53:23,483 ERROR tune_controller.py:911 -- Trial task failed for trial PPO_4x4grid_406c2_00000 Traceback (most recent call last): File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\air\execution_internal\event_manager.py", line 110, in resolve_future result = ray.get(future) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\auto_init_hook.py", line 24, in auto_init_wrapper return fn(*args, *kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\client_mode_hook.py", line 103, in wrapper return func(args, **kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\worker.py", line 2495, in get raise value ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPO.init() (pid=18404, ip=127.0.0.1, actor_id=4669252480053ecd755caaa001000000, repr=PPO) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 227, in _setup self.add_workers( File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 593, in add_workers raise result.get() File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\utils\actor_manager.py", line 481, in __fetch_result result = ray.get(r) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\auto_init_hook.py", line 24, in auto_init_wrapper return fn(*args, *kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\client_mode_hook.py", line 103, in wrapper return func(args, **kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\worker.py", line 2495, in get raise value ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.init() (pid=11480, ip=127.0.0.1, actor_id=c6a956ce57c8ee3338a6239201000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x00000156A27CE4A0>) File "python\ray_raylet.pyx", line 1424, in ray._raylet.execute_task File "python\ray_raylet.pyx", line 1364, in ray._raylet.execute_task.function_executor File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\function_manager.py", line 726, in actor_method_executor return method(__ray_actor, *args, *kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\util\tracing\tracing_helper.py", line 464, in _resume_span return method(self, _args, **_kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 397, in init self.env = env_creator(copy.deepcopy(self.env_context)) File "", line 4, in File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\pettingzoo\utils\conversions.py", line 17, in par_fn env = env_fn(kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\sumo_rl\environment\env.py", line 32, in env env = SumoEnvironmentPZ(kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\sumo_rl\environment\env.py", line 505, in init self.env = SumoEnvironment(**self._kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\sumo_rl\environment\env.py", line 149, in init__ traci.start([sumolib.checkBinary("sumo"), "-n", self._net], label="init_connection" + self.label) File "C:\Program Files (x86)\Eclipse\Sumo\tools\traci\main.py", line 147, in start result = init(sumoPort, numRetries, "localhost", label, sumoProcess, doSwitch, traceFile, traceGetters) File "C:\Program Files (x86)\Eclipse\Sumo\tools\traci\main.py", line 119, in init return con.getVersion() File "C:\Program Files (x86)\Eclipse\Sumo\tools\traci\connection.py", line 376, in getVersion result = self._sendCmd(command, None, None) File "C:\Program Files (x86)\Eclipse\Sumo\tools\traci\connection.py", line 225, in _sendCmd return self._sendExact() File "C:\Program Files (x86)\Eclipse\Sumo\tools\traci\connection.py", line 135, in _sendExact raise FatalTraCIError("connection closed by SUMO") traci.exceptions.FatalTraCIError: connection closed by SUMO

During handling of the above exception, another exception occurred:

ray::PPO.init() (pid=18404, ip=127.0.0.1, actor_id=4669252480053ecd755caaa001000000, repr=PPO) File "python\ray_raylet.pyx", line 1418, in ray._raylet.execute_task File "python\ray_raylet.pyx", line 1498, in ray._raylet.execute_task File "python\ray_raylet.pyx", line 1424, in ray._raylet.execute_task File "python\ray_raylet.pyx", line 1364, in ray._raylet.execute_task.function_executor File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray_private\function_manager.py", line 726, in actor_method_executor return method(ray_actor, *args, *kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\util\tracing\tracing_helper.py", line 464, in _resume_span return method(self, _args, **_kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 517, in init super().init( File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\tune\trainable\trainable.py", line 169, in init self.setup(copy.deepcopy(self.config)) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\util\tracing\tracing_helper.py", line 464, in _resume_span return method(self, *_args, **_kwargs) File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 639, in setup self.workers = WorkerSet( File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 179, in init__ raise e.args[0].args[2] traci.exceptions.FatalTraCIError: connection closed by SUMO 2023-07-28 17:53:23,511 WARNING tune.py:1122 -- Trial Runner checkpointing failed: Sync process failed: GetFileInfo() yielded path 'C:/Users/Bahar/ray_results/PPO/PPO_4x4grid_233c9_00000_0_2023-07-28_11-39-55', which is outside base dir 'C:\Users\Bahar\ray_results\PPO' Trial status: 1 ERROR Current time: 2023-07-28 17:53:23. Total running time: 24s Logical resource usage: 0/12 CPUs, 0/1 GPUs ╭────────────────────────────────────╮ │ Trial name status │ ├────────────────────────────────────┤ │ PPO_4x4grid_406c2_00000 ERROR │ ╰────────────────────────────────────╯

Number of errored trials: 1 ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Trial name # failures error file │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ PPO_4x4grid_406c2_00000 1 C:\Users\Bahar\ray_results\PPO\PPO_4x4grid_406c2_00000_0_2023-07-28_17-52-58\error.txt │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Traceback (most recent call last): File "", line 1, in File "C:\Users\Bahar\AppData\Local\Programs\Python\Python310\lib\site-packages\ray\tune\tune.py", line 1142, in run raise TuneError("Trials did not complete", incomplete_trials) ray.tune.error.TuneError: ('Trials did not complete', [PPO_4x4grid_406c2_00000])

jenniferhahn commented 1 year ago

Your path seems to be the issue here - see " Error: File 'nets/4x4-Lucas/4x4.net.xml' is not accessible (No such file or directory)."