flow-project / flow

Computational framework for reinforcement learning in traffic control
MIT License
1.06k stars 374 forks source link

addition of many RL vehicles in a given scenario and activate render only after an instance #526

Open pnp91 opened 5 years ago

pnp91 commented 5 years ago

Hello, I have added more number of RL vehicles in the example "cooperative merge" distributively. But I have noticed that, my agents which I have added, does not accelerate much when compared with 'human' vehicles. Because of this, there are many collisions which leads in teleporting and the agent doesn't train well. I am training using A3C. I have planned to train this further with SAC algorithm

  1. I have tried with the sumocarfollowing parameters 'speed_factor', 'speed_dev' and EnvParamteres 'max_accel'. I first tried to train the agent with 'warmup_steps', hoping that RL vehicles will increase their speed afterwards. But they did not accelerate. Is it possible that I can make my RL vehicles run with the speed of human vehicles to avoid collision and perform better training?
  2. Is there a way to set the render of the simulation only at the end of training or after certain training steps are completed? Because if I use [sim=SumoParams(render =True)] , it renders from first training step and of-course it is known that training doesn't happen well when render is True.

I have my code as follows :

vehicles.add(
    veh_id='rl2',
    acceleration_controller=(RLController, {}),
    lane_change_controller=(SimLaneChangeController, {}),
    routing_controller=(ContinuousRouter, {}),
    num_vehicles=1,
    car_following_params=SumoCarFollowingParams(
        minGap=0.02,
        tau=0.2,
        sigma = 0.9,
        speed_mode="obey_safe_speed",
        speedFactor="normc(1,0.1,0.2,150)",
        speedDev="10",
    ),
    lane_change_params=SumoLaneChangeParams())
flow_params = dict(
    # name of the experiment
    exp_tag='cooperative_merge_Changed_rl_values',

    # name of the flow environment the experiment is running on
    env_name='TwoLoopsMergePOEnv',

    # name of the scenario class the experiment is running on
    scenario='TwoLoopsOneMergingScenario',

    # simulator that is used by the experiment
    simulator='traci',

    # sumo-related parameters (see flow.core.params.SumoParams)
    sim=SumoParams(
        sim_step=0.1,
        render=False,
    ),

    # environment related parameters (see flow.core.params.EnvParams)
    env=EnvParams(
        horizon=HORIZON,
        additional_params={
            'max_accel': 100,
            'max_decel': 3,
            'target_velocity': 10,
            'n_preceding': 2,
            'n_following': 2,
            'n_merging_in': 2,
            'warmup_steps': 200, 
        },
    ),

    # network-related parameters (see flow.core.params.NetParams and the
    # scenario's documentation or ADDITIONAL_NET_PARAMS component)
    net=NetParams(
        no_internal_links=False,
        additional_params={
            'ring_radius': 50,
            'lane_length': 75,
            'inner_lanes': 1,
            'outer_lanes': 1,
            'speed_limit': 30,
            'resolution': 40,
        },
    ),

    # vehicles to be placed in the network at the start of a rollout (see
    # flow.core.vehicles.Vehicles)
    veh=vehicles,

    # parameters specifying the positioning of vehicles upon initialization/
    # reset (see flow.core.params.InitialConfig)
    initial=InitialConfig(
        x0=50,
        spacing='uniform',
        additional_params={
            'merge_bunching': 0,
        },
    ),
)
def setup_exps():

    alg_run = 'A3C'

    agent_cls = get_agent_class(alg_run)
    config = agent_cls._default_config.copy()
    config['num_workers'] = N_CPUS
    config['train_batch_size'] = HORIZON * N_ROLLOUTS
    config['gamma'] = 0.999  # discount rate
    config['model'].update({'fcnet_hiddens': [16, 16, 16]})
    config['lambda'] = 0.97
    config['clip_actions'] = False  
    config['horizon'] = HORIZON

    # save the flow params for replay
    flow_json = json.dumps(
        flow_params, cls=FlowParamsEncoder, sort_keys=True, indent=4)
    config['env_config']['flow_params'] = flow_json
    config['env_config']['run'] = alg_run

    #flow_params = variant['flow_params']
    #flow_params = get_flow_params(variant)
    #flow_params['sim'].render = True
    #print(flow_params)

    create_env, gym_name = make_create_env(params=flow_params, version=0)

    # Register as rllib env
    register_env(gym_name, create_env)
    return alg_run, gym_name, config

if __name__ == '__main__':
    alg_run, gym_name, config = setup_exps()
    ray.init(num_cpus=N_CPUS+1, redirect_output=False)
    trials = run_experiments({
        flow_params['exp_tag']: {
            'run': alg_run,
            'env': gym_name,
            'config': {
                **config
            },
            'checkpoint_freq': 20,
            'max_failures': 999,
            'stop': {
                'training_iteration': 200,
            },
        }
    })

OS : Linux 18.04 Flow Version : 0.3.0 SUMO Version : v0_31_0-812-g1d4338ab80

AboudyKreidieh commented 5 years ago

Hi @pnp91, in response to your questions:

  1. The RL vehicle might be colliding because your minGap value is too small. I would recommend taking it up to a larger value, say around 2. You can also consider increasing your tau value to around 0.5 or 1. Tuning thee values and possibly others may be key for you to avoid collisions.
  2. To visualize the policy after training (with the renderer on), you will want to use the flow/visualizer_rllib.py method, see: https://flow.readthedocs.io/en/latest/visualizing.html
pnp91 commented 5 years ago

Hi @AboudyKreidieh :

  1. I have already checked with these parameters and was not much visible difference. Hence I tried with speed_factor and speed_dev as well. But it did not help.
  2. If I am correct, visualize is only for viewing one checkpoint and not from a particular point till end of training, right?. Yes I have already used 'visualize' after training is completed. But only for visualizing particular checkpoints and not to check from a specified training point till the end of training.
AboudyKreidieh commented 5 years ago

hi @pnp91:

  1. You can avoid collisions altogether by setting speed_mode to "all_checks"; however, that may lead to some unrealistic behavior by vehicles on the main highway. Other values for the speed_mode, say 9, may also do the trick for you.
  2. I don't think sumo supports rendering for only a subset of simulations, as you must call specific binaries that either include the gui or not before you actually run anything. You may want to consider initializing the states of the vehicles to something desirable and running the simulation with the renderer turned on from there.
jingyanping commented 5 years ago

When I was running the flow/examples/rllib/stabilizing_the_ring.py. it generated a series of files, such as 'checkpoint_20', 'checkpoint_40'..., 'progress.csv', 'result.json'. When I finished training, I didn't know how to look at my training results. I read the documentation find that have two ways to resolve that,

First : input code python ./visualizer_rllib.py /ray_results/result_dir 1 at terminal, but this command is reported an error can't open file....

Second: tensorboard --logdir=~/ray_results, the terminal always displays "TensorBoard. 1.9.0 at http://ubuntu:6006 (Press CTRL+C to quit)" So I want to know how I should look at the results after my training.@pnp91@eugenevinitsky