CityofToronto / bdit_traffic_prophet

Suite of algorithms for predicting average daily traffic on Toronto streets
GNU General Public License v3.0
1 stars 1 forks source link

TEPs-I 2019 Outputs for EED #47

Closed cczhu closed 3 years ago

cczhu commented 3 years ago

Environment & Energy Division (EED) requests 2019 data to complete their GHG inventory.

Follow the process documented in #41.

Tasks:

cczhu commented 3 years ago

Relevant notebooks

Convert Postgres data to zips for TEPS

TEPs output analysis

Relevant e-mails

From myself 2020-02-14

We're testing if we can generate AADT estimates for 2017-2018 by feeding in new short term and permanent counts into TEPs-I. As part of this, I'm also hoping to determine which files inside of the Emission folder are needed for TEPs-II, and which files were only used by Arman for testing.

I've generated a preliminary dataset of AADT estimates for EED. These numbers are not final, as we're still QCing the results, but I was hoping we could test the data handover process with you to make sure we know exactly which files we need to send in the future. Would you be willing to help us to run TEPs-II on the dataset? The files are large, and I can either upload them to Google Drive or use the secure FTP system you used to send TEPs-II to me. If you prefer the latter, do you know how I can sign up for that service and upload files?

From Sean Severin 2020-02-24

(After some back and forth with me to confirm the validity of his VKT numbers.

I've just finished re-running and summing the data from the model run after deleting all the old stuff, and the numbers have not changed.

Local paths

TEPs-I: /mnt/c/Users/czhu5/Documents/VolumeModel/TEPs-I-EEDrun/

cczhu commented 3 years ago

Aakash refreshed prj_volume.centreline_volumes on 2020-12-14. I forgot that we're reading from czhu.btp_centreline_volumes, so I refreshed prj_volume.centreline_volumes.

cczhu commented 3 years ago

Problem: prj_volume.centreline_volumes contains both ATR and TMC counts - the former is count_type = 1 and the latter 2. TMC counts aren't 24 hour counts, and neither Traffic Prophet nor TEPs have controls that ensure they're rejected by the count matching algorithm. We should remove them from the zip-making procedure.

Trouble is they were included back in March for the 2017 and 2018 zips.

New strategy:

cczhu commented 3 years ago

Running TEPs-I for 2017 with the following settings (which should be identical to this post:

image

The working directory here is Documents\VolumeModel\TEPS-I-EEDrun\

I pressed Estimate AADTs. I'll press `Estimate Vehicle Speeds next, following this comment

Irritatingly I never bothered to document which files are generated by this process, so will have to do that later.

cczhu commented 3 years ago

All done!

image

Now for speeds.

cczhu commented 3 years ago

Running with these settings:

image

cczhu commented 3 years ago

All done!

image

cczhu commented 3 years ago

Zipped files for EED

image

cczhu commented 3 years ago

Running TEPs Output Analysis.ipynb (copied from 20200218-tepsrerunwithnewdata) we can analyze the new "proposed" data against the older "baseline" run from February 2020. Recall this run erroneously includes 7.5 hour TMCs as short term counts in 2017-2018.

There are four tests included in TEPs Output Analysis.ipynb:

Z-score test

This assumes that the value and lower and upper bounds given in the final_aadt_{year}.csv files coming out of TEPs represent estimates for the mean and twice the standard deviation at each centreline and year. The difference between the old and new estimates, then, should go like

image

(That's a lot of assumptions, so this check needs to be taken with a grain of salt.)

In the following, we plot histograms of the z-score of the differences between proposed and baseline AADTs (orange bars). A standard normal (blue line) is also plotted for comparison. In the past the distributions have all been much more narrowly peaked than the standard normal, but now we get weird plots like this:

image

I'm honestly not sure how to interpret this plot, but apparently I didn't place much faith in this check back in February either, since I never published any plots in the GitHub issue. So let's ignore this and go for a direct comparison between AADTs.

cczhu commented 3 years ago

Absolute and Relative Difference

This is a plot of the proposed vs. the baseline data on a linear scale:

image

We see two distributions - one that lands pretty closely to the y = x line (blue dashed), and a tail of outliers shallower than y = x close to the origin. Plotting the relative difference:

image

errors at extremely large AADTs are typically at or below the 10% level while there's a tail of 100-1000% error for very small AADTs. The shallow tail of outliers in the absolute plot can also be seen in the relative plot at around 30% relative error and < 0.5e5 baseline AADT.

cczhu commented 3 years ago

Baseline vs. proposed map:

I also want to check that these errors are not geospatially clustered. We can do this by plotting the fractional deviation on a map for a given year:

image

I don't think there's any kind of visible spatial clustering here - instead, it looks like overall there is 10-20% error for AADT on minor roads, and closer to a few percent error on major roads.

The hope is these errors cancel each other out when summing up to the citywide VKT.

cczhu commented 3 years ago

Citywide VKT estimate

And they do!

Here's a plot of the citywide VKT from both baseline and proposed data for all years in common:

image

image

And here's their relative error:

image

The relative error between baseline and proposed is less than 1%. It's probably not worth it to care much further than this if the data is only to be used for a citywide estimate.

cczhu commented 3 years ago

Deleted all extraneous files from Emission following the format of `EED_20200214_minimalist.zip, and uploaded to Google Drive. Will close this issue upon successful delivery.

aharpalaniTO commented 3 years ago

Hi @cczhu - thanks for the detailed yet clear error checking here. I also have no clue how to interpret that z-score chart but the rest of it looked great. Love the maps as always. I'm going to send off.

A few other random comments / thoughts:

cczhu commented 3 years ago

Back in February (e-mail 2021-02-11) Sean discovered that there were >5% changes in the historical eCO2 estimates from TEPs-II using this most recent run of AADTs and speeds we sent him:

image

Considering that the AADTs themselves change by < 1%, we suspected that this was a bug either in the speed estimation or emissions factor portions of TEPs. We contacted Arman for assistance. He discovered this to be because estimated AADTs had left the range of values the neural network was trained on, leading to spurious predictions highly sensitive to AADT (e-mail 2021-03-15). He addressed the issue by switching EMME-2011 simulation speed-volume data with EMME-2016 data, which he sent back to us (e-mail 2021-03-16).

To confirm that we can use Arman's updated data, running TEPs-I's speed estimate module only using the settings:

image

cczhu commented 3 years ago

Made a remarkably stupid mistake and only ran 2006-2013 speeds, as seen above. Revised to 2006-2019:

image

cczhu commented 3 years ago

Run successfully completed, but TEPS crashed when zipping files.

Addendum - it worked, and it took an overly long time because it included the archive zip files as well as the new files...

Addendum 2 - man Windows sucks at removing zip files from zip files. I apologize for blaming TEPS for something that's a Windows problem.

cczhu commented 3 years ago

After much haranguing, I could consistently reproduce a TEPS bug where folders I've placed in the Recycling Bin and then deleted get included when TEPS produces a zip for TEPS-II. This literally blew my mind. I'm astonished this can happen, and vehemently refuse to try restarting my computer to see if it'll fix it.

Since "zip for TEPS-II" is literally just zipping up the Emission folder, I'll do that manually and will advise my future self to do so as well.

cczhu commented 3 years ago

Sent a zip file to Sean to run TEPs-II. He obtained:

image

vs. Arman's results using my TEPs-I output while generating new neural network training data:

image

I'm not 100% certain why they aren't identical (since running our data through an NN in feedforward mode should be deterministic), but the differences are small enough that I can't bring myself to care 🤷‍♂️.

Since we've now confirmed we can obtain sensible eCO2 measures from TEPs, closing this issue.

cczhu commented 3 years ago

Turns out Sean sent over the wrong results. Here (from e-mail 2021-03-31) are his emissions calculations using the speeds Arman generated:

image

and his results using the speeds I generated:

image

They're identical, so case closed.