arup-group / gelato

Gelato turns MATSim outputs into transport sustainability KPI metrics
GNU General Public License v3.0
10 stars 1 forks source link

Speed KPI - more than 24 hours, missing links & data structure #24

Open val-ismaili opened 8 months ago

val-ismaili commented 8 months ago

The speed KPI has a number of issues that I found comparing against Elara outputs for TE.

For reference:

time java -Xmx120g -jar target/gelato-0.0.1-alpha-with-dependencies-eb19697.jar \
-mc /mnt/efs/simulations_refresh/10pc/workdist_5km_20230826/200/output_config.xml \
-mo /mnt/efs/simulations_refresh/10pc/workdist_5km_20230826/200 \
-o /mnt/efs/analysis/vi/gelato/baseline

1) Data structure should be improved by ordering columns from 0-23 (currently 0,1,10,11,12 etc.)

2) Speed table for TE extends to hour 31!

3) If speed of link is 0 across all time periods (which presumably means no traffic) then the link is not included in kpi-speed.csv. We should still include these links as this is a parameter and provides useful context.

4) There are 18,473 links (see below for an extract) that are included in the gelato output but not found in elara logs. Most of these seem to be links between the same nodes which are written out as 0 across all time periods in gelato. But then there are some exceptions to this list which appear to be different nodes and have speeds associated with them.

['5177455651423483735_5177455651423483735',
 '5221404044885987507_5221404044885987507',
 '5177430235961007913_5177430235961007913',
 '5177125817509627245_5177125817509627245',
 '5177024736422142319_5177024736422142319',
 '5177174031462487069_5177174031462487069',
 '5177138868980649475_5177174043054889671',
 '5177347075229862649_5177347075229862649',
 '5177343675762528811_5177343675762528811',
 '5177043119941093583_5177043119941093583',
 '5176679471526675737_5176679471526675737',
 '5177410312476284525_5177410312476284525',
 '5221396608680638307_5221396608680638307',
 '5177092587874821721_5177092587874821721',
 '5177100767215689927_5177100767215689927',
 '5177345379506603341_5177380946716280041',
 '5177456069152357911_5177456069152357911',
 '5176983071155334701_5176983071155334701',
 '5177288984360657131_5177288984360657131',
 '5177419897909524023_5177419897909524023',
 '5177174099055637673_5177174099055637673',
 '5177147151296536559_5177147151296536559',
 '5177400179844327273_5177400179844327273',
 '5176696600309477565_5176696600309477565',
 '5176960865348689221_5176960865348689221',
...
 '5177002355861982197_5177002355861982197',
 '5177290203876150117_5177290203876150117',
 '5177112472063736515_5177117395690362335',
 '5177359920442072303_5177359920442072303',
 ...]
divyasharma-arup commented 8 months ago

@syhwawa, two issues here:

  1. what is the percent of agents that are still travelling after 24 hours in the Paris East model? Curious to understand the magnitude of stuck agents.
  2. Why would Gelato report on links that are not in the Elara logs for TE? Should these links be excluded for some reason, or were they wrongfully excluded from TE? Or is there an issue with the data type?
syhwawa commented 8 months ago

Hey @divyasharma-arup, I've done some analysis on the Paris East speed outputs.

  1. what is the percentage of agents that are still travelling after 24 hours in the Paris East model? Curious to understand the magnitude of stuck agents.

In the MATSim configuration, we set endTime = 30:00:00. Approximately 3% of agents engaged in activities after 24 hours. Here's the breakdown of activities for agents with overnight activities:

outside      77.111975
home         15.073263
other         2.880789
leisure       2.392366
shop          1.234624
work          1.139653
education     0.167330

For the TE model, around 0.2% of agents had activities extending beyond 24 hours:

delivery            30.594679
home                27.151800
depot               19.561815
other                8.920188
shop                 3.364632
work                 3.364632
visit                2.816901
business             1.095462
escort_work          0.860720
escort_education     0.704225
escort_other         0.704225
escort_home          0.547731
escort_shop          0.234742
escort_business      0.078247

TE results look more reasonable since it actually captured the overnight activities mainly on delivery and depot.

It seems to me that the outcomes heavily depend on the approach to population synthesis and the calibration of the simulation.

  1. Why would Gelato report on links that are not in the Elara logs for TE? Should these links be excluded for some reason, or were they wrongfully excluded from TE? Or is there an issue with the data type?

From the Paris East results, I've observed findings similar to what Val noted for TE.

The kpi-speed.csv file does not include rows where all speed values are 0.

I'm not sure if I fully understand the current method used. One approach could be to modify the table after the pivot operation, to identify and reinsert rows that might have been excluded due to all their speeds being 0.

The kpi-speed.csv file also has a lot of links which can't be found in elara speed outputs link_vehicle_speeds_car_average.csv. I feel it could be the extracting approach could be slightly different between gelato and elara. Need more investigation into that.