CityofToronto / bdit_traffic_prophet

Suite of algorithms for predicting average daily traffic on Toronto streets
GNU General Public License v3.0
1 stars 1 forks source link

TEPs-I 2017/2018 Outputs for EED #41

Closed cczhu closed 4 years ago

cczhu commented 4 years ago

Environment & Energy Division (EED) requests 2017/2018 data to complete their GHG inventory.

Following #4 (which will never be closed, ever, since it's about running TEPs-I in general rather than for some particular purpose...), we need to be sure we're able to include new data from 2017 and 2018 to generate city VKT and speed estimates for 2017 and 2018.

Tasks:

cczhu commented 4 years ago

Confirmed that changing either 'Year of Analysis' or 'Start year' or 'End year' leads to the error message

image

In the backend TEPs has a check whenever the drop-down menu is used whether the 15-minute bin zip exists.

cczhu commented 4 years ago

Successful attempt at running TEPs-I:

image

Correlation for PRTCS is extremely low, however:

image

suggesting some of the systematic issues with TEPs-I might be decreasing predictive accuracy away from ~2010.

This suggests we should look into generating preliminary estimates with CountMatch and feeding the results into KCOUNT, but in the meantime we'll also attempt to run with 2018 data and a broader range of years over this weekend.

cczhu commented 4 years ago

Will now attempt run using:

image

cczhu commented 4 years ago

Run completed successfully, and generated 183 figures...

image

Side note - closing 183 figures using "close all windows" leads to a joyful animation in Windows 10.

cczhu commented 4 years ago

Run including speeds completed, producing the following (plus 190 other plots):

image

Issue - 2017 and 2018 speeds were not produced. Investigating.

cczhu commented 4 years ago

False alarm - Avg_speed_{POS_OR_NEG}_{YEAR}.csv were produced in the outputs folder, not inputs.

Now that speeds have successfully been exported to Emission/input and Emission/output, I'm examining which files are crucial for TEPs-II. A few notes on TEPs-II's codebase are now here. I'm confused by there being a toronto_sim.m in TEPs-I, and a toronto_sim_EED.m in TEPs-II - which should be run?

cczhu commented 4 years ago

Not anywhere close to done analyzing the code structure of TEPs-II, but it does appear that the ANN is only trained when calculating vehicle speeds, which is included only with TEPs-I. The ANN weights are used to predict counterfactuals in TEPs-II, which requires weights files (such as gregnet) from TEPs-I. This will make it effectively impossible to use empirical speeds rather than ANN speeds without Arman's assisistance in modifying TEPs-II. It might also be impossible to move all speed calculations into TEPs-II without his help, leaving a vestigial vehicle speeds estimator application that we're only able to run using TEPs rather than traffic prophet.

cczhu commented 4 years ago

Turns out that ANN_HTC.m, ANN_VehSpeed_alone.m and make_data_for_ANN_Veh_alone.m are identical between TEPs-I/Emission and TEPs-II. This is also true of toronto_sim.m and toronto_sim_EED.m, except that toronto_sim_EED.m has this additional block

for i=1:length(senario_site_ids)
    if ~isempty(id(find(id==senario_site_ids(i))))
        nominated(i)=1+reduced.Value/100;
    else
        nominated(i)=1;
    end
    AADT(find(id==senario_site_ids(i)))=AADT(find(id==senario_site_ids(i)))*nominated(i);
end

This handles scaling the AADTs by % Change in AADT in the case of running counterfactual scenarios using TEPs-II.

cczhu commented 4 years ago

Completed an inventory of TEPs-II's code structure, here. Determined an inventory of files (found at the end of that document) required for TEPs-II to run.

Given the way the code is structure, it's not currently possible to inject empirically measured speeds on roads, but a simple change should make this possible. Emissions factors are handled using TAF2F.m, which reads in Avg_speed_{POS_OR_NEG}_{YEAR}.csv for speeds. We can overwrite this file, but it is automatically rewritten by toronto_sim_EED.m. This is because if counterfactuals - AADT multiplicative factor for certain streets - are specified by the user in the TEPs-II UI, the ANN needs to be called to predict new speeds (since more congestion == less speed). Arman (or myself) needs to comment out the handling of speed prediction in TAF2F.m entirely; that would then allow us to dump empirical speeds into Avg_speed_{POS_OR_NEG}_{YEAR}.csv.

It's also not currently possible to train the ANN using TEPs-II, but this isn't as big of an issue. The two ANNs use input_new_AM.csv, output_new_AM.csv, input_new2.csv and output_new2.csv, respectively, none of which use Toronto data. We can therefore safely continue to use gregnet1 and gregnet2b without running the Emission module in TEPs-I if we can emulate input_for_toronto_sim4_{POS_OR_NEG}_{YEAR}.csv using Traffic Prophet.

cczhu commented 4 years ago

Compared AADT estimates directly from Arman's OneDrive versus values I recalculated for #4 (see here for settings) and values I calculated that include 2017-2018 PTC and STTC data. I'll refer to these as "baseline", "recalc" and "proposed", respectively (perhaps poorly named). The full, and more detailed analysis, is in the sandbox branch, under 20200218-tepsrerunwithnewdata

For a citywide inventory, we are most interested in citywide (annually averaged daily) VKT. This value for all three runs is plotted below as a function of time:

image

Here are the absolute fractional errors (abs(proposed - baseline) / baseline):

image

The recalc VKT closely follows the baseline one, deviating by less than 0.5% for any year being compared. The proposed run, with 2017-2018 data, does deviate from Arman's baseline, but this difference goes down to below 0.5% for 2014-2016, and year-on-year growth seems pretty constant between those years and 2017-2018.

Comparing baseline to recalc values directly, there doesn't appear to be any systematic offset in AADTs:

image

The same appears broadly true when comparing proposed (with 2017-2018 data) with baseline, though there is a more significant variance:

image

What is hard to see in this plot, however, is there's a systematic ~10% increase in proposed AADTs compared to baseline for years before 2010. This systematic offset isn't heavily dependent on the magnitude of the AADT (or else it would show up on the above plot). It can be seen by mapping the absolute fractional error (abs(proposed - baseline) / baseline) for 2006:

image

and 2016:

image

We see the errors are closer to 1% for 2016, but ~10% for 2006. This temporal dependence of the error is probably because the growth rate changes when more data is added. It also appears that errors are spatially clustered - eg. the area around Eglinton and Avenue, St. Clair & Mt. Pleasant, and the area around Queen E. and the DVP, are all hotspots of high error. This may be a consequence of spatial regression propagating errors produced by PRTCS.

In conclusion:

cczhu commented 4 years ago

Sent a zip of proposed 2017-2018 data for Sean Severin at EED to analyze using TEPs-II.

cczhu commented 4 years ago

Sean confirms our zip produces the same numbers with and without the extraneous files under TEPs-II/Emission. We can now tie this work off. Some concluding thoughts:

We can address these issues at a later date.