Closed cczhu closed 3 years ago
Convert Postgres data to zips for TEPS
From myself 2020-02-14
We're testing if we can generate AADT estimates for 2017-2018 by feeding in new short term and permanent counts into TEPs-I. As part of this, I'm also hoping to determine which files inside of the Emission folder are needed for TEPs-II, and which files were only used by Arman for testing.
I've generated a preliminary dataset of AADT estimates for EED. These numbers are not final, as we're still QCing the results, but I was hoping we could test the data handover process with you to make sure we know exactly which files we need to send in the future. Would you be willing to help us to run TEPs-II on the dataset? The files are large, and I can either upload them to Google Drive or use the secure FTP system you used to send TEPs-II to me. If you prefer the latter, do you know how I can sign up for that service and upload files?
From Sean Severin 2020-02-24
(After some back and forth with me to confirm the validity of his VKT numbers.
I've just finished re-running and summing the data from the model run after deleting all the old stuff, and the numbers have not changed.
TEPs-I:
/mnt/c/Users/czhu5/Documents/VolumeModel/TEPs-I-EEDrun/
Aakash refreshed prj_volume.centreline_volumes
on 2020-12-14. I forgot that we're reading from czhu.btp_centreline_volumes
, so I refreshed prj_volume.centreline_volumes
.
Problem: prj_volume.centreline_volumes
contains both ATR and TMC counts - the former is count_type = 1
and the latter 2
. TMC counts aren't 24 hour counts, and neither Traffic Prophet nor TEPs have controls that ensure they're rejected by the count matching algorithm. We should remove them from the zip-making procedure.
Trouble is they were included back in March for the 2017 and 2018 zips.
New strategy:
Running TEPs-I for 2017 with the following settings (which should be identical to this post:
The working directory here is Documents\VolumeModel\TEPS-I-EEDrun\
I pressed Estimate AADTs
. I'll press `Estimate Vehicle Speeds next, following this comment
Irritatingly I never bothered to document which files are generated by this process, so will have to do that later.
All done!
Now for speeds.
Running with these settings:
All done!
Zipped files for EED
Running TEPs Output Analysis.ipynb
(copied from 20200218-tepsrerunwithnewdata
) we can analyze the new "proposed" data against the older "baseline" run from February 2020. Recall this run erroneously includes 7.5 hour TMCs as short term counts in 2017-2018.
There are four tests included in TEPs Output Analysis.ipynb
:
This assumes that the value and lower and upper bounds given in the final_aadt_{year}.csv
files coming out of TEPs represent estimates for the mean and twice the standard deviation at each centreline and year. The difference between the old and new estimates, then, should go like
(That's a lot of assumptions, so this check needs to be taken with a grain of salt.)
In the following, we plot histograms of the z-score of the differences between proposed and baseline AADTs (orange bars). A standard normal (blue line) is also plotted for comparison. In the past the distributions have all been much more narrowly peaked than the standard normal, but now we get weird plots like this:
I'm honestly not sure how to interpret this plot, but apparently I didn't place much faith in this check back in February either, since I never published any plots in the GitHub issue. So let's ignore this and go for a direct comparison between AADTs.
This is a plot of the proposed vs. the baseline data on a linear scale:
We see two distributions - one that lands pretty closely to the y = x line (blue dashed), and a tail of outliers shallower than y = x close to the origin. Plotting the relative difference:
errors at extremely large AADTs are typically at or below the 10% level while there's a tail of 100-1000% error for very small AADTs. The shallow tail of outliers in the absolute plot can also be seen in the relative plot at around 30% relative error and < 0.5e5 baseline AADT.
I also want to check that these errors are not geospatially clustered. We can do this by plotting the fractional deviation on a map for a given year:
I don't think there's any kind of visible spatial clustering here - instead, it looks like overall there is 10-20% error for AADT on minor roads, and closer to a few percent error on major roads.
The hope is these errors cancel each other out when summing up to the citywide VKT.
And they do!
Here's a plot of the citywide VKT from both baseline and proposed data for all years in common:
And here's their relative error:
The relative error between baseline and proposed is less than 1%. It's probably not worth it to care much further than this if the data is only to be used for a citywide estimate.
Deleted all extraneous files from Emission
following the format of `EED_20200214_minimalist.zip, and uploaded to Google Drive. Will close this issue upon successful delivery.
Hi @cczhu - thanks for the detailed yet clear error checking here. I also have no clue how to interpret that z-score chart but the rest of it looked great. Love the maps as always. I'm going to send off.
A few other random comments / thoughts:
Back in February (e-mail 2021-02-11) Sean discovered that there were >5% changes in the historical eCO2 estimates from TEPs-II using this most recent run of AADTs and speeds we sent him:
Considering that the AADTs themselves change by < 1%, we suspected that this was a bug either in the speed estimation or emissions factor portions of TEPs. We contacted Arman for assistance. He discovered this to be because estimated AADTs had left the range of values the neural network was trained on, leading to spurious predictions highly sensitive to AADT (e-mail 2021-03-15). He addressed the issue by switching EMME-2011 simulation speed-volume data with EMME-2016 data, which he sent back to us (e-mail 2021-03-16).
To confirm that we can use Arman's updated data, running TEPs-I's speed estimate module only using the settings:
Made a remarkably stupid mistake and only ran 2006-2013 speeds, as seen above. Revised to 2006-2019:
Run successfully completed, but TEPS crashed when zipping files.
Addendum - it worked, and it took an overly long time because it included the archive zip files as well as the new files...
Addendum 2 - man Windows sucks at removing zip files from zip files. I apologize for blaming TEPS for something that's a Windows problem.
After much haranguing, I could consistently reproduce a TEPS bug where folders I've placed in the Recycling Bin and then deleted get included when TEPS produces a zip for TEPS-II. This literally blew my mind. I'm astonished this can happen, and vehemently refuse to try restarting my computer to see if it'll fix it.
Since "zip for TEPS-II" is literally just zipping up the Emission
folder, I'll do that manually and will advise my future self to do so as well.
Sent a zip file to Sean to run TEPs-II. He obtained:
vs. Arman's results using my TEPs-I output while generating new neural network training data:
I'm not 100% certain why they aren't identical (since running our data through an NN in feedforward mode should be deterministic), but the differences are small enough that I can't bring myself to care 🤷♂️.
Since we've now confirmed we can obtain sensible eCO2 measures from TEPs, closing this issue.
Turns out Sean sent over the wrong results. Here (from e-mail 2021-03-31) are his emissions calculations using the speeds Arman generated:
and his results using the speeds I generated:
They're identical, so case closed.
Environment & Energy Division (EED) requests 2019 data to complete their GHG inventory.
Follow the process documented in #41.
Tasks:
PRTCS/negative/mid_f_point.csv
andLanduse_pop_lane_speed.xlsx
).