TEPs-I 2017/2018 Outputs for EED

cczhu commented 4 years ago

Environment & Energy Division (EED) requests 2017/2018 data to complete their GHG inventory.

Following #4 (which will never be closed, ever, since it's about running TEPs-I in general rather than for some particular purpose...), we need to be sure we're able to include new data from 2017 and 2018 to generate city VKT and speed estimates for 2017 and 2018.

Tasks:

[x] First confirm that it's impossible to make 2017 and 2018 estimates currently, since there are no 15-minute count bin zips for those.
[x] Produce 15-minute counts for those years.
- [x] Since we're restricted to information TEPs has on locations, determine which centreline IDs exist in TEPs PRTCS and KCOUNT data (this includes PRTCS/negative/mid_f_point.csv and Landuse_pop_lane_speed.xlsx).
- [x] Determine if there's any conflict between centreline IDs assigned to HW401 data and
[x] Confirm TEPs can be successfully run using new data.
[x] Run TEPs using data only up to 2016 for year of analysis 2016, and the same with 2017 and 2018 data inclusive. Compare the difference.
[x] Run TEPs using all available data for years of analysis 2017 and 2018.

cczhu commented 4 years ago

Confirmed that changing either 'Year of Analysis' or 'Start year' or 'End year' leads to the error message

In the backend TEPs has a check whenever the drop-down menu is used whether the 15-minute bin zip exists.

cczhu commented 4 years ago

Successful attempt at running TEPs-I:

Correlation for PRTCS is extremely low, however:

suggesting some of the systematic issues with TEPs-I might be decreasing predictive accuracy away from ~2010.

This suggests we should look into generating preliminary estimates with CountMatch and feeding the results into KCOUNT, but in the meantime we'll also attempt to run with 2018 data and a broader range of years over this weekend.

cczhu commented 4 years ago

Will now attempt run using:

cczhu commented 4 years ago

Run completed successfully, and generated 183 figures...

Side note - closing 183 figures using "close all windows" leads to a joyful animation in Windows 10.

cczhu commented 4 years ago

Run including speeds completed, producing the following (plus 190 other plots):

Issue - 2017 and 2018 speeds were not produced. Investigating.

cczhu commented 4 years ago

False alarm - Avg_speed_{POS_OR_NEG}_{YEAR}.csv were produced in the outputs folder, not inputs.

Now that speeds have successfully been exported to Emission/input and Emission/output, I'm examining which files are crucial for TEPs-II. A few notes on TEPs-II's codebase are now here. I'm confused by there being a toronto_sim.m in TEPs-I, and a toronto_sim_EED.m in TEPs-II - which should be run?

cczhu commented 4 years ago

Not anywhere close to done analyzing the code structure of TEPs-II, but it does appear that the ANN is only trained when calculating vehicle speeds, which is included only with TEPs-I. The ANN weights are used to predict counterfactuals in TEPs-II, which requires weights files (such as gregnet) from TEPs-I. This will make it effectively impossible to use empirical speeds rather than ANN speeds without Arman's assisistance in modifying TEPs-II. It might also be impossible to move all speed calculations into TEPs-II without his help, leaving a vestigial vehicle speeds estimator application that we're only able to run using TEPs rather than traffic prophet.

cczhu commented 4 years ago

Turns out that ANN_HTC.m, ANN_VehSpeed_alone.m and make_data_for_ANN_Veh_alone.m are identical between TEPs-I/Emission and TEPs-II. This is also true of toronto_sim.m and toronto_sim_EED.m, except that toronto_sim_EED.m has this additional block

for i=1:length(senario_site_ids)
    if ~isempty(id(find(id==senario_site_ids(i))))
        nominated(i)=1+reduced.Value/100;
    else
        nominated(i)=1;
    end
    AADT(find(id==senario_site_ids(i)))=AADT(find(id==senario_site_ids(i)))*nominated(i);
end

This handles scaling the AADTs by % Change in AADT in the case of running counterfactual scenarios using TEPs-II.

cczhu commented 4 years ago

Completed an inventory of TEPs-II's code structure, here. Determined an inventory of files (found at the end of that document) required for TEPs-II to run.

Given the way the code is structure, it's not currently possible to inject empirically measured speeds on roads, but a simple change should make this possible. Emissions factors are handled using TAF2F.m, which reads in Avg_speed_{POS_OR_NEG}_{YEAR}.csv for speeds. We can overwrite this file, but it is automatically rewritten by toronto_sim_EED.m. This is because if counterfactuals - AADT multiplicative factor for certain streets - are specified by the user in the TEPs-II UI, the ANN needs to be called to predict new speeds (since more congestion == less speed). Arman (or myself) needs to comment out the handling of speed prediction in TAF2F.m entirely; that would then allow us to dump empirical speeds into Avg_speed_{POS_OR_NEG}_{YEAR}.csv.

It's also not currently possible to train the ANN using TEPs-II, but this isn't as big of an issue. The two ANNs use input_new_AM.csv, output_new_AM.csv, input_new2.csv and output_new2.csv, respectively, none of which use Toronto data. We can therefore safely continue to use gregnet1 and gregnet2b without running the Emission module in TEPs-I if we can emulate input_for_toronto_sim4_{POS_OR_NEG}_{YEAR}.csv using Traffic Prophet.

cczhu commented 4 years ago

Compared AADT estimates directly from Arman's OneDrive versus values I recalculated for #4 (see here for settings) and values I calculated that include 2017-2018 PTC and STTC data. I'll refer to these as "baseline", "recalc" and "proposed", respectively (perhaps poorly named). The full, and more detailed analysis, is in the sandbox branch, under 20200218-tepsrerunwithnewdata

For a citywide inventory, we are most interested in citywide (annually averaged daily) VKT. This value for all three runs is plotted below as a function of time:

Here are the absolute fractional errors (abs(proposed - baseline) / baseline):

The recalc VKT closely follows the baseline one, deviating by less than 0.5% for any year being compared. The proposed run, with 2017-2018 data, does deviate from Arman's baseline, but this difference goes down to below 0.5% for 2014-2016, and year-on-year growth seems pretty constant between those years and 2017-2018.

Comparing baseline to recalc values directly, there doesn't appear to be any systematic offset in AADTs:

The same appears broadly true when comparing proposed (with 2017-2018 data) with baseline, though there is a more significant variance:

What is hard to see in this plot, however, is there's a systematic ~10% increase in proposed AADTs compared to baseline for years before 2010. This systematic offset isn't heavily dependent on the magnitude of the AADT (or else it would show up on the above plot). It can be seen by mapping the absolute fractional error (abs(proposed - baseline) / baseline) for 2006:

and 2016:

We see the errors are closer to 1% for 2016, but ~10% for 2006. This temporal dependence of the error is probably because the growth rate changes when more data is added. It also appears that errors are spatially clustered - eg. the area around Eglinton and Avenue, St. Clair & Mt. Pleasant, and the area around Queen E. and the DVP, are all hotspots of high error. This may be a consequence of spatial regression propagating errors produced by PRTCS.

In conclusion:

Results don't drastically change between the three runs, particularly when looking at aggregate VKT for years after 2014.
Individual centreline predictions don't change AADT predictions very much, though there is some noise introduced.
If new data is added, predictions do change, and these changes become more significant the further back in time we go. This is likely because of the growth rate changing with new permanent count stationse being included in the data.

cczhu commented 4 years ago

Sent a zip of proposed 2017-2018 data for Sean Severin at EED to analyze using TEPs-II.

cczhu commented 4 years ago

Sean confirms our zip produces the same numbers with and without the extraneous files under TEPs-II/Emission. We can now tie this work off. Some concluding thoughts:

EED's final report deadline is early June, so we need to generate more refined numbers by late April at best.
EED will have to justify GHG inventory changes of ~5% or higher. This means that if Traffic Prophet estimates a VKT difference of greater than roughly this, we should be concerned.
To estimate how much Traffic Prophet will change predictions, we can also run TEPs-I's KCOUNT and LSVR on CountMatch outputs.

We can address these issues at a later date.

CityofToronto / bdit_traffic_prophet

TEPs-I 2017/2018 Outputs for EED #41