dcgym / iroko

A platform to test reinforcement learning policies in the datacenter setting.
Apache License 2.0
67 stars 22 forks source link

iroko_env problem #30

Closed wushilan closed 4 years ago

wushilan commented 4 years ago

Hi, iroko team: Thanks for your IROKO, it is powerful and easy to use. But when I try to run run_basic.py, if I choose a large number of timesteps, I encounter the following error. When I train my RL algorithm, I also encounter this problem if the number of episodes is too large. Do you know how to solve it? Best wishes!

wushilan commented 4 years ago

kong@kong-VirtualBox:~/iroko$ sudo -E python3 run_basic.py INFO:Loading environment dc_gym.env_iroko INFO:Loading topology dc_gym.topos.topo_dumbbell INFO:Host h0 IP 10.2.0.1 INFO:Host h1 IP 10.1.0.1 INFO:Host h2 IP 10.2.0.2 INFO:Host h3 IP 10.1.0.2 /home/kong/.local/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) INFO:Lenb Setting action space INFO:from [0.001 0.001 0.001 0.001] INFO:to [1. 1. 1. 1.] INFO:Lenb Stopping environment... INFO:Lenb Done with stopping. INFO:Lenb Starting environment... INFO:Lenb Starting network manager... Unable to contact the remote controller at 127.0.0.1:6653 Unable to contact the remote controller at 127.0.0.1:6633 Setting remote controller to 127.0.0.1:6653 INFO:Lenb Starting traffic generator... INFO:Starting servers INFO:Starting controllers INFO:Starting traffic INFO:Loading file: /home/kong/iroko/dc_gym/inputs/dumbbell/incast_2 INFO:Starting load-generators INFO:Lenb Done with resetting. Generator Finished. Simulation over. Clearing dc_env... INFO:Lenb Closing environment... INFO:Lenb Stopping all state collectors... INFO:QueueCollector: Received termination signal! Exiting.. INFO:StatsCollector: Received termination signal! Exiting.. INFO:Lenb Shutting down bandwidth control... INFO:PolicyEnforcer: Received termination signal! Exiting.. INFO:Lenb Shutting down data sampling. INFO:SampleCollector: Received termination signal! Exiting.. INFO:Lenb Shutting down generators... INFO: INFO:Stopping traffic processes --- Logging error --- Traceback (most recent call last): File "/home/kong/iroko/dc_gym/utils.py", line 74, in kill_processes os.kill(proc.pid, 15) ProcessLookupError: [Errno 3] No such process

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.6/logging/init.py", line 994, in emit msg = self.format(record) File "/usr/lib/python3.6/logging/init.py", line 840, in format return fmt.format(record) File "/usr/lib/python3.6/logging/init.py", line 577, in format record.message = record.getMessage() File "/usr/lib/python3.6/logging/init.py", line 338, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting Call stack: File "run_basic.py", line 74, in init() File "run_basic.py", line 68, in init test_run(INPUT_DIR, output_dir, ARGS.env, ARGS.topo) File "run_basic.py", line 55, in test_run dc_env.close() File "/home/kong/iroko/dc_gym/env_iroko.py", line 229, in close self.traffic_gen.close() File "/home/kong/iroko/dc_gym/iroko_traffic.py", line 46, in close self._stop_traffic() File "/home/kong/iroko/dc_gym/iroko_traffic.py", line 221, in _stop_traffic dc_utils.kill_processes(self.traffic_procs) File "/home/kong/iroko/dc_gym/utils.py", line 78, in kill_processes log.info("Could not kill process %d: " % proc.pid, e) Message: 'Could not kill process 2946: ' Arguments: (ProcessLookupError(3, 'No such process'),) --- Logging error --- Traceback (most recent call last): File "/home/kong/iroko/dc_gym/utils.py", line 74, in kill_processes os.kill(proc.pid, 15) ProcessLookupError: [Errno 3] No such process

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.6/logging/init.py", line 994, in emit msg = self.format(record) File "/usr/lib/python3.6/logging/init.py", line 840, in format return fmt.format(record) File "/usr/lib/python3.6/logging/init.py", line 577, in format record.message = record.getMessage() File "/usr/lib/python3.6/logging/init.py", line 338, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting Call stack: File "run_basic.py", line 74, in init() File "run_basic.py", line 68, in init test_run(INPUT_DIR, output_dir, ARGS.env, ARGS.topo) File "run_basic.py", line 55, in test_run dc_env.close() File "/home/kong/iroko/dc_gym/env_iroko.py", line 229, in close self.traffic_gen.close() File "/home/kong/iroko/dc_gym/iroko_traffic.py", line 46, in close self._stop_traffic() File "/home/kong/iroko/dc_gym/iroko_traffic.py", line 221, in _stop_traffic dc_utils.kill_processes(self.traffic_procs) File "/home/kong/iroko/dc_gym/utils.py", line 78, in kill_processes log.info("Could not kill process %d: " % proc.pid, e) Message: 'Could not kill process 2947: ' Arguments: (ProcessLookupError(3, 'No such process'),) INFO: INFO:Stopping services INFO:Writing collected data to disk INFO:Lenb Stopping network. INFO:Removing interfaces and restoring all network state. INFO:Deleting the virtual network INFO:Done saving statistics... INFO:Successfully deleted the virtual network INFO:Lenb Done with destroying myself.

wushilan commented 4 years ago

I find that if the code execute to env.reset() or env.close(),that error will appear.

fruffy commented 4 years ago

Thanks for pointing this out, you uncovered a couple of issues. The warning is fine, but it indicates that the traffic generator has died before the environment terminated. The reason is that in this line here we only run the traffic generators for 60 seconds. 60 seconds is one episode of traffic generation and after that the environment needs to reset. We introduced this to align the environment more with the openAI and ray model of episodes, which call reset after each episode. Ideally this should be parametrized based on the input files we provide.

This change makes the timesteps largely meaningless, I neglected updating run_basic to reflect that. I pushed some changes to run_basic.py.

I hope this did not cause you too much trouble.

wushilan commented 4 years ago

Thanks for your reply. I modify my code ,and run run_basic.py. It's outputs are INFO:Loading environment dc_gym.env_iroko INFO:Loading topology dc_gym.topos.topo_dumbbell INFO:Host h0 IP 10.2.0.1 INFO:Host h1 IP 10.1.0.1 INFO:Host h2 IP 10.2.0.2 INFO:Host h3 IP 10.1.0.2 INFO:gy2A Setting action space INFO:from [0.001 0.001 0.001 0.001] INFO:to [1. 1. 1. 1.] INFO:gy2A Stopping environment... INFO:gy2A Done with stopping. INFO:gy2A Starting environment... INFO:gy2A Starting network manager... Unable to contact the remote controller at 127.0.0.1:6653 Unable to contact the remote controller at 127.0.0.1:6633 Setting remote controller to 127.0.0.1:6653 INFO:gy2A Starting traffic generator... INFO:Starting servers INFO:Starting controllers INFO:Starting traffic INFO:Loading file: /home/kong/iroko/dc_gym/inputs/dumbbell/incast_2 INFO:Starting load-generators INFO:gy2A Done with resetting. INFO:gy2A Stopping environment... INFO:gy2A Stopping traffic INFO: INFO:Stopping traffic processes WARNING:Could not kill process 2671: [Errno 3] No such process WARNING:Could not kill process 2672: [Errno 3] No such process INFO:gy2A Done with stopping. INFO:gy2A Starting environment... INFO:Starting traffic INFO:Loading file: /home/kong/iroko/dc_gym/inputs/dumbbell/incast_2 INFO:Starting load-generators INFO:gy2A Done with resetting. INFO:Generator Finished. Simulation over. Clearing dc_env... INFO:gy2A Closing environment... INFO:gy2A Stopping all state collectors... INFO:QueueCollector: Received termination signal! Exiting.. INFO:StatsCollector: Received termination signal! Exiting.. INFO:gy2A Shutting down bandwidth control... INFO:PolicyEnforcer: Received termination signal! Exiting.. INFO:gy2A Shutting down data sampling. INFO:SampleCollector: Received termination signal! Exiting.. INFO:gy2A Shutting down generators... INFO: INFO:Stopping traffic processes WARNING:Could not kill process 4468: [Errno 3] No such process WARNING:Could not kill process 4469: [Errno 3] No such process INFO: INFO:Stopping services INFO:gy2A Stopping network. INFO:Removing interfaces and restoring all network state. INFO:Deleting the virtual network INFO:Writing collected data to disk INFO:Done saving statistics... INFO:Successfully deleted the virtual network INFO:gy2A Done with destroying myself.

Is this correct?

fruffy commented 4 years ago

Yes this looks about right. The warning comes from the fact that the traffic generator process has already terminated, which is the expected behaviour.

wushilan commented 4 years ago

Thank you. Best wishes