Reward and matched orders all 0 in 'matching' mode.

KL4805 commented 11 months ago

Hi authors,

As per your suggestion I debugged the matching mode. Now it can run. However, after several hundreds of epochs, I still receive 0 rewards and 0 matched orders. Is that normal? If not, what can be modified to fix it?

Here are the system messages:

epoch: 1276
epoch running time:  12.907965183258057
epoch total reward:  0
total orders 57.0000001
matched orders 1e-07
step1:order dispatching: 0.03428983688354492
step2:reaction 0.16186165809631348
step3:bootstrap new orders: 0.2233142852783203
step4:cruise: 10.932743549346924
step4_1:track_recording 0.6921463012695312
step5:update state 0.48050379753112793
step6:offline update 0.37604355812072754
step7: update time 9.584426879882812e-05
epoch: 1277
epoch running time:  13.268260955810547
epoch total reward:  0
total orders 63.0000001
matched orders 1e-07
step1:order dispatching: 0.03837323188781738
step2:reaction 0.16072487831115723
step3:bootstrap new orders: 0.2405104637145996
step4:cruise: 11.218846082687378
step4_1:track_recording 0.6832997798919678
step5:update state 0.5146937370300293
step6:offline update 0.40406155586242676
step7: update time 9.036064147949219e-05

I use the following config.py. I think I just modified 'rl_mode' to matching, and nothing else.

env_params = {
't_initial' :36000,
't_end' : 36100,
'delta_t' : 5,  # s
'vehicle_speed' : 22.788,   # km / h
'repo_speed' : 1, #目前的设定需要与vehicle speed保持一致
'order_sample_ratio' : 1,
'order_generation_mode' : 'sample_from_base',
'driver_sample_ratio' : 1,
'maximum_wait_time_mean' : 300,
'maximum_wait_time_std' : 0,
"maximum_pickup_time_passenger_can_tolerate_mean":float('inf'),  # s
"maximum_pickup_time_passenger_can_tolerate_std":0, # s
"maximum_price_passenger_can_tolerate_mean":float('inf'), # ￥
"maximum_price_passenger_can_tolerate_std":0,  # ￥
'maximal_pickup_distance' : 1,  # km
'request_interval': 5,  #
'cruise_flag' :False,
'delivery_mode':'rg',
'pickup_mode':'rg',
'max_idle_time' : 1,
'cruise_mode': 'random',
'reposition_flag': False,
'eligible_time_for_reposition' : 10, # s
'reposition_mode': '',
'track_recording_flag' : True,
'driver_far_matching_cancel_prob_file' : 'driver_far_matching_cancel_prob',
'input_file_path':'input/dataset.csv',
'request_file_name' : 'input/order', #'toy_requests',
'driver_file_name' : 'input/driver_info',
'road_network_file_name' : 'road_network_information.pickle',
'dispatch_method': 'LD', #LD: lagarange decomposition method designed by Peibo Duan
# 'method': 'instant_reward_no_subway',
'simulator_mode' : 'toy_mode',
'experiment_mode' : 'train',
'driver_num':500,
'side':10,
'price_per_km':5,  # ￥ / km
'road_information_mode':'load',
'price_increasing_percentage': 0.2,
'north_lat': 40.8845,
'south_lat': 40.6968,
'east_lng': -74.0831,
'west_lng': -73.8414,
'rl_mode': 'matching',  # reposition and matching
'method': 'sarsa_no_subway',  #  'sarsa_no_subway' / 'pickup_distance' / 'instant_reward_no_subway'   #  rl for matching
'reposition_method': 'A2C_global_aware',  # A2C, A2C_global_aware, random_cruise, stay  # rl for repositioning
'dayparting': True, # if true, simulator_env will compute information based on time periods in a day, e.g. 'morning', 'afternoon'
}
wait_time_params_dict = {'morning': [2.582, 2.491, 0.026, 1.808, 2.581],
                    'evening': [4.862, 2.485, 0, 1.379, 13.456],
                    'midnight_early': [0, 2.388, 2.972, 2.954, 3.14],
                    'other': [0, 2.017, 2.978, 2.764, 2.973]}

pick_time_params_dict = {'morning': [1.877, 2.018, 2.691, 1.865, 6.683],
                    'evening': [2.673,2.049,2.497,1.736,9.208],
                    'midnight_early': [3.589,2.319,2.185,1.664,9.6],
                    'other': [0,1.886,4.099,3.185,3.636]}

price_params_dict = {'short': [1.245,0.599,10.629,10.305,0.451],
                    'short_medium': [0.451,0.219,19.585,58.407,0.18],
                    'medium_long': [14.411,4.421,11.048,9.228,145],
                    'long': [15.821,3.409,0,16.221,838.587]}

# price_increase_params_dict = {'morning': [0.001,1.181,3.583,4.787,0.001],
#                     'evening': [0,1.21,2.914,5.023,0.013],
#                     'midnight_early': [1.16,0,0,6.366,0],
#                     'other': [0,2.053,0.857,4.666,1.961]}
#  rl for matching
# global variable and parameters for sarsa
START_TIMESTAMP = 36000  # the start timestamp
LEN_TIME_SLICE = 5  # the length of a time slice, 5 minute (300 seconds) in this experiment
LEN_TIME = 100 # 3 hours
NUM_EPOCH = 4001  # 4001 / 3001
FLAG_LOAD = False
sarsa_params = dict(learning_rate=0.005, discount_rate=0.95)  # parameters in sarsa algorithm
#  rl for matching

# rl for repositioning
# hyperparameters for rl
# NUM_EPOCH = 101
STOP_EPOCH = 100
DISCOUNT_FACTOR = 0.95
ACTOR_LR = 0.001
CRITIC_LR = 0.005
ACTOR_STRUCTURE = [64,128] # s[16, 32] for A2C, and [64, 128] for A2C global aware
CRITIC_STRUCTURE = [64,128]
# rl for repositioning

#  rl for matching
# parameters for exploration
INIT_EPSILON = 0.9
FINAL_EPSILON = 0
DECAY = 0.997
PRE_STEP = 0
#  rl for matching

#  rl for matching
# TRAIN_DATE_LIST = ['2015-07-06', '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10',
#                    '2015-07-13', '2015-07-14', '2015-07-15', '2015-07-16', '2015-07-17'
#                    ]
TRAIN_DATE_LIST = ['2015-05-04', '2015-05-05', '2015-05-06', '2015-05-07', '2015-05-08', '2015-05-11', '2015-05-12', '2015-05-13', '2015-05-14', '2015-05-15', '2015-05-18']

TEST_DATE_LIST = ['2015-07-27', '2015-07-28', '2015-07-29', '2015-07-30', '2015-07-31']
#  rl for matching

chenhaoling commented 11 months ago

It seems t that your simulation period is too small. If the value of 't_initial' is 36000 and value of 't_end' is 36100, the total simulation period is only 100s. You can change the settings of these two parameters. For example, if you want to do experiment from 4:00 to 7:00, you can set the 't_initial' as 14400 and set the 't_end' as 25200.

KL4805 commented 11 months ago

Thanks for the reply. I now know what t_init and t_end means. Should I accordingly set LEN_TIME_SLICE and LEN_TIME?

#  rl for matching
# global variable and parameters for sarsa
START_TIMESTAMP = 36000  # the start timestamp
LEN_TIME_SLICE = 5  # the length of a time slice, 5 minute (300 seconds) in this experiment
LEN_TIME = 100 # 3 hours
NUM_EPOCH = 4001  # 4001 / 3001
FLAG_LOAD = False
sarsa_params = dict(learning_rate=0.005, discount_rate=0.95)  # parameters in sarsa algorithm
#  rl for matching

chenhaoling commented 11 months ago

Sure. If the total simulation period is from 4:00 to 7:00, the value of 'LEN_TIME' should be 3 * 3600, which equal to 10800. In addition, the value of 'LEN_TIME_SLICE' should be same as the value of 'request_interval.

KL4805 commented 11 months ago

By setting your values, I encounter the following error.

Traceback (most recent call last):
  File "simulator/main.py", line 237, in <module>
    agent.perceive(dispatch_transitions)
  File "/Users/yilunjin/Desktop/Transportation_Simulator/simulator/sarsa.py", line 110, in perceive
    self.update_q_value_table(s0, s1, rewards[index])
  File "/Users/yilunjin/Desktop/Transportation_Simulator/simulator/sarsa.py", line 57, in update_q_value_table
    self.q_value_table[s0] = (1 - self.learning_rate) * self.q_value_table[s0] + \
KeyError: <utilities.State object at 0x7f8404dd40d0>

My config setting is

env_params = {
't_initial' :14400, # These two parameters mark the start and end of the simulation. Unit:second. 14400 denotes 4am. 
't_end' : 25200, # 7am. 
'delta_t' : 5,  # s
'vehicle_speed' : 22.788,   # km / h
'repo_speed' : 1, #目前的设定需要与vehicle speed保持一致
'order_sample_ratio' : 1,
'order_generation_mode' : 'sample_from_base',
'driver_sample_ratio' : 1,
'maximum_wait_time_mean' : 300,
'maximum_wait_time_std' : 0,
"maximum_pickup_time_passenger_can_tolerate_mean":float('inf'),  # s
"maximum_pickup_time_passenger_can_tolerate_std":0, # s
"maximum_price_passenger_can_tolerate_mean":float('inf'), # ￥
"maximum_price_passenger_can_tolerate_std":0,  # ￥
'maximal_pickup_distance' : 1,  # km
'request_interval': 5,  #
'cruise_flag' :False,
'delivery_mode':'rg',
'pickup_mode':'rg',
'max_idle_time' : 1,
'cruise_mode': 'random',
'reposition_flag': False,
'eligible_time_for_reposition' : 10, # s
'reposition_mode': '',
'track_recording_flag' : True,
'driver_far_matching_cancel_prob_file' : 'driver_far_matching_cancel_prob',
'input_file_path':'input/dataset.csv',
'request_file_name' : 'input/order', #'toy_requests',
'driver_file_name' : 'input/driver_info',
'road_network_file_name' : 'road_network_information.pickle',
'dispatch_method': 'LD', #LD: lagarange decomposition method designed by Peibo Duan
# 'method': 'instant_reward_no_subway',
'simulator_mode' : 'toy_mode',
'experiment_mode' : 'train',
'driver_num':500,
'side':10,
'price_per_km':5,  # ￥ / km
'road_information_mode':'load',
'price_increasing_percentage': 0.2,
'north_lat': 40.8845,
'south_lat': 40.6968,
'east_lng': -74.0831,
'west_lng': -73.8414,
'rl_mode': 'matching',  # reposition and matching
'method': 'sarsa_no_subway',  #  'sarsa_no_subway' / 'pickup_distance' / 'instant_reward_no_subway'   #  rl for matching
'reposition_method': 'A2C_global_aware',  # A2C, A2C_global_aware, random_cruise, stay  # rl for repositioning
'dayparting': True, # if true, simulator_env will compute information based on time periods in a day, e.g. 'morning', 'afternoon'
}
wait_time_params_dict = {'morning': [2.582, 2.491, 0.026, 1.808, 2.581],
                    'evening': [4.862, 2.485, 0, 1.379, 13.456],
                    'midnight_early': [0, 2.388, 2.972, 2.954, 3.14],
                    'other': [0, 2.017, 2.978, 2.764, 2.973]}

pick_time_params_dict = {'morning': [1.877, 2.018, 2.691, 1.865, 6.683],
                    'evening': [2.673,2.049,2.497,1.736,9.208],
                    'midnight_early': [3.589,2.319,2.185,1.664,9.6],
                    'other': [0,1.886,4.099,3.185,3.636]}

price_params_dict = {'short': [1.245,0.599,10.629,10.305,0.451],
                    'short_medium': [0.451,0.219,19.585,58.407,0.18],
                    'medium_long': [14.411,4.421,11.048,9.228,145],
                    'long': [15.821,3.409,0,16.221,838.587]}

# price_increase_params_dict = {'morning': [0.001,1.181,3.583,4.787,0.001],
#                     'evening': [0,1.21,2.914,5.023,0.013],
#                     'midnight_early': [1.16,0,0,6.366,0],
#                     'other': [0,2.053,0.857,4.666,1.961]}
#  rl for matching
# global variable and parameters for sarsa
START_TIMESTAMP = 36000  # the start timestamp
LEN_TIME_SLICE = 5  # should be the same as 'request_interval'
LEN_TIME = 3 * 3600 # 3 hours, equal to t_end - t_init
NUM_EPOCH = 4001  # 4001 / 3001
FLAG_LOAD = False
sarsa_params = dict(learning_rate=0.005, discount_rate=0.95)  # parameters in sarsa algorithm
#  rl for matching

# rl for repositioning
# hyperparameters for rl
# NUM_EPOCH = 101
STOP_EPOCH = 100
DISCOUNT_FACTOR = 0.95
ACTOR_LR = 0.001
CRITIC_LR = 0.005
ACTOR_STRUCTURE = [64,128] # s[16, 32] for A2C, and [64, 128] for A2C global aware
CRITIC_STRUCTURE = [64,128]
# rl for repositioning

#  rl for matching
# parameters for exploration
INIT_EPSILON = 0.9
FINAL_EPSILON = 0
DECAY = 0.997
PRE_STEP = 0
#  rl for matching

#  rl for matching
# TRAIN_DATE_LIST = ['2015-07-06', '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10',
#                    '2015-07-13', '2015-07-14', '2015-07-15', '2015-07-16', '2015-07-17'
#                    ]
TRAIN_DATE_LIST = ['2015-05-04', '2015-05-05', '2015-05-06', '2015-05-07', '2015-05-08', '2015-05-11', '2015-05-12', '2015-05-13', '2015-05-14', '2015-05-15', '2015-05-18']

TEST_DATE_LIST = ['2015-07-27', '2015-07-28', '2015-07-29', '2015-07-30', '2015-07-31']
#  rl for matching

Caffretro commented 11 months ago

Try modifying these settings in config.py: START_TIMESTAMP = 36000 # the start timestamp LEN_TIME_SLICE = 5 # should be the same as 'request_interval' LEN_TIME = 3 * 3600 # 3 hours, equal to t_end - t_init Make sure that the START_TIMESTAMP matches the t_initial above, and the LEM_TIME matches (t_end - t_initial)

KL4805 commented 11 months ago

I set START_TIMESTAMP to 14400, LEN_TIME to 3 * 3600, and t_end to 25200, t_init to 14400), and the error persists.

env_params = {
't_initial' :14400, # These two parameters mark the start and end of the simulation. Unit:second. 14400 denotes 4am. 
't_end' : 25200, # 7am. 
'delta_t' : 5,  # s
'vehicle_speed' : 22.788,   # km / h
'repo_speed' : 1, #目前的设定需要与vehicle speed保持一致
'order_sample_ratio' : 1,
'order_generation_mode' : 'sample_from_base',
'driver_sample_ratio' : 1,
'maximum_wait_time_mean' : 300,
'maximum_wait_time_std' : 0,
"maximum_pickup_time_passenger_can_tolerate_mean":float('inf'),  # s
"maximum_pickup_time_passenger_can_tolerate_std":0, # s
"maximum_price_passenger_can_tolerate_mean":float('inf'), # ￥
"maximum_price_passenger_can_tolerate_std":0,  # ￥
'maximal_pickup_distance' : 1,  # km
'request_interval': 5,  #
'cruise_flag' :False,
'delivery_mode':'rg',
'pickup_mode':'rg',
'max_idle_time' : 1,
'cruise_mode': 'random',
'reposition_flag': False,
'eligible_time_for_reposition' : 10, # s
'reposition_mode': '',
'track_recording_flag' : True,
'driver_far_matching_cancel_prob_file' : 'driver_far_matching_cancel_prob',
'input_file_path':'input/dataset.csv',
'request_file_name' : 'input/order', #'toy_requests',
'driver_file_name' : 'input/driver_info',
'road_network_file_name' : 'road_network_information.pickle',
'dispatch_method': 'LD', #LD: lagarange decomposition method designed by Peibo Duan
# 'method': 'instant_reward_no_subway',
'simulator_mode' : 'toy_mode',
'experiment_mode' : 'train',
'driver_num':500,
'side':10,
'price_per_km':5,  # ￥ / km
'road_information_mode':'load',
'price_increasing_percentage': 0.2,
'north_lat': 40.8845,
'south_lat': 40.6968,
'east_lng': -74.0831,
'west_lng': -73.8414,
'rl_mode': 'matching',  # reposition and matching
'method': 'sarsa_no_subway',  #  'sarsa_no_subway' / 'pickup_distance' / 'instant_reward_no_subway'   #  rl for matching
'reposition_method': 'A2C_global_aware',  # A2C, A2C_global_aware, random_cruise, stay  # rl for repositioning
'dayparting': True, # if true, simulator_env will compute information based on time periods in a day, e.g. 'morning', 'afternoon'
}
wait_time_params_dict = {'morning': [2.582, 2.491, 0.026, 1.808, 2.581],
                    'evening': [4.862, 2.485, 0, 1.379, 13.456],
                    'midnight_early': [0, 2.388, 2.972, 2.954, 3.14],
                    'other': [0, 2.017, 2.978, 2.764, 2.973]}

pick_time_params_dict = {'morning': [1.877, 2.018, 2.691, 1.865, 6.683],
                    'evening': [2.673,2.049,2.497,1.736,9.208],
                    'midnight_early': [3.589,2.319,2.185,1.664,9.6],
                    'other': [0,1.886,4.099,3.185,3.636]}

price_params_dict = {'short': [1.245,0.599,10.629,10.305,0.451],
                    'short_medium': [0.451,0.219,19.585,58.407,0.18],
                    'medium_long': [14.411,4.421,11.048,9.228,145],
                    'long': [15.821,3.409,0,16.221,838.587]}

# price_increase_params_dict = {'morning': [0.001,1.181,3.583,4.787,0.001],
#                     'evening': [0,1.21,2.914,5.023,0.013],
#                     'midnight_early': [1.16,0,0,6.366,0],
#                     'other': [0,2.053,0.857,4.666,1.961]}
#  rl for matching
# global variable and parameters for sarsa
START_TIMESTAMP = 14400  # the start timestamp
LEN_TIME_SLICE = 5  # should be the same as 'request_interval'
LEN_TIME = 3 * 3600 # 3 hours, equal to t_end - t_init
NUM_EPOCH = 4001  # 4001 / 3001
FLAG_LOAD = False
sarsa_params = dict(learning_rate=0.005, discount_rate=0.95)  # parameters in sarsa algorithm
#  rl for matching

# rl for repositioning
# hyperparameters for rl
# NUM_EPOCH = 101
STOP_EPOCH = 100
DISCOUNT_FACTOR = 0.95
ACTOR_LR = 0.001
CRITIC_LR = 0.005
ACTOR_STRUCTURE = [64,128] # s[16, 32] for A2C, and [64, 128] for A2C global aware
CRITIC_STRUCTURE = [64,128]
# rl for repositioning

#  rl for matching
# parameters for exploration
INIT_EPSILON = 0.9
FINAL_EPSILON = 0
DECAY = 0.997
PRE_STEP = 0
#  rl for matching

#  rl for matching
# TRAIN_DATE_LIST = ['2015-07-06', '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10',
#                    '2015-07-13', '2015-07-14', '2015-07-15', '2015-07-16', '2015-07-17'
#                    ]
TRAIN_DATE_LIST = ['2015-05-04', '2015-05-05', '2015-05-06', '2015-05-07', '2015-05-08', '2015-05-11', '2015-05-12', '2015-05-13', '2015-05-14', '2015-05-15', '2015-05-18']

TEST_DATE_LIST = ['2015-07-27', '2015-07-28', '2015-07-29', '2015-07-30', '2015-07-31']
#  rl for matching

KL4805 commented 11 months ago

Never mind, it is working. My fault.

Caffretro commented 11 months ago

I guess you did not save the modifications XD

HKU-Smart-Mobility-Lab / Transportation_Simulator

Reward and matched orders all 0 in 'matching' mode. #29