CityofToronto / bdit_traffic_prophet

Suite of algorithms for predicting average daily traffic on Toronto streets
GNU General Public License v3.0
1 stars 1 forks source link

Run TEPs-I #4

Closed cczhu closed 3 years ago

cczhu commented 5 years ago

Run TEPs-I to predict volumes, and retrain TEPs-I to see if we can reproduce the same predictions.

MATLAB should not be required here.

cczhu commented 5 years ago

After acquiring MATLAB 2016r2, and installing Gnuplot 5.2.2 and R 3.6.1 (and RStudio), can now run access the TEPs-I GUI under KCOUNT/codes/App2.mlapp. Not sure if it's supposed to be here. More worryingly, attempting to run the model using the GUI with the default settings (except Working Directory, which is C:\Users\czhu5\Documents\VolumeModel\TEPS-dev\) leads to an error message saying that nansum is missing:

Error using STTC_estimate3 (line 116)
Undefined function 'nansum' for input arguments of type 'double'.

Error in main_DoM_new_2012 (line 30)
        parfor iyear=Start_year:End_year

Error in main_combined_2 (line 50)
                    main_DoM_new_2012(str1,direction{idir},FY.Value,EY.Value,base_year,allyearindex.Value,ishort_krig);

Error in App2/EstimateAADTsButtonPushed (line 322)
                main_combined_2(app.WorkingDirectoryEditField,option, ...

Error in appdesigner.internal.service.AppManagementService/tryCallback (line 207)
                    callback(app, event);

Error in
matlab.apps.AppBase>@(source,event)tryCallback(appdesigner.internal.service.AppManagementService.instance(),app,callback,requiresEventData,event) 
Error using matlab.ui.control.internal.controller.ComponentController/executeUserCallback (line 262)
Error while evaluating Button PrivateButtonPushedFcn

nansum is part of the Statistics and Machine Learning add-on to MATLAB, which costs $1000 USD...

cczhu commented 5 years ago

Initialized a local WSL git repo under TEPS-dev to track any changes I make to the code. Will add various files to the .gitignore as needed.

cczhu commented 5 years ago

Jan Glscher uploaded a version of nansum in his NaN Suite. Imported this suite into charles/nansuite and appended the path under main_combined_2.m. Result:

Error using STTC_estimate3 (line 125)
Subscripted assignment dimension mismatch.

Error in main_DoM_new_2012 (line 30)
        parfor iyear=Start_year:End_year

Error in main_combined_2 (line 53)
                    main_DoM_new_2012(str1,direction{idir},FY.Value,EY.Value,base_year,allyearindex.Value,ishort_krig);

Error in App2/EstimateAADTsButtonPushed (line 322)
                main_combined_2(app.WorkingDirectoryEditField,option, ...

Error in appdesigner.internal.service.AppManagementService/tryCallback (line 207)
                    callback(app, event);

Error in
matlab.apps.AppBase>@(source,event)tryCallback(appdesigner.internal.service.AppManagementService.instance(),app,callback,requiresEventData,event) 
Error using matlab.ui.control.internal.controller.ComponentController/executeUserCallback (line 262)
Error while evaluating Button PrivateButtonPushedFcn

which suggests we can't just replace whatever version of nansum Arman used with this one.

cczhu commented 5 years ago

Different tack - try running TEPs-I's executable using Matlab 2016b (Update 6) runtime downloaded from MathWorks. Created TEPS-exerun to run this, since I don't know what intermediate files will be created by the executable that will ruin my testing from above. Will run the exe overnight using all default settings (since the exe is in the root directory I don't even have to change the working directory).

cczhu commented 5 years ago

TEPS-exerun is successfully able to return a series of diagnostic graphs from PRTCS:

image

but soon after also plotting the KCOUNT diagnostic figures (manual Figs. C-1 - C-3) suddenly closes all figures and returns this:

image

Not sure what version of TEPs-I the exe is built from, since the only uncommented msgbox('Error: Model need revision!') in App2.mlapp is under function ARIMAButtonPushed, used for PECOUNT-I.

cczhu commented 5 years ago

Deployed a temporary GitHub repo to house TEPs-I's original source code, mainly as a backup for local files and to communicate with Arman as needed. I'm NOT including it under bdit_teps because:

2021-07-13 update - I deleted the temporary repo, to avoid any confusion from future GitHub spelunkers. Copies of TEPS are available on L: drive.

cczhu commented 5 years ago

There's a program dependency report generator! Dumped reports in PDF form into DependencyReports folder.

cczhu commented 5 years ago

TEPs Dependencies

Here are all the functions and objects that could not be identified by MATLAB's dependency reports.

Emission

export_fig

KCOUNT

LocalSVR

All good.

OptimStation

PECOUNT

All good.

PRTCS

From this it looks like there's no way we can run the Emission or OptimStation modules without the Global Optimization, Financial, Deep Learning and Statistics/ML Toolboxes, though we weren't planning on doing that anyway. We cannot run KCOUNT without the Statistics/ML Toolbox, though. We could probably run PRTCS if Arman could dump corr, nanmean and nansum and send them over.

The UNKNOWN!s will need to be investigated further.

cczhu commented 5 years ago
Name Used In Cost (USD for Annual License) Necessary?
Statistics and Machine Learning Toolbox Emission, KCOUNT, OptimStation, PRTCS 400 Yes
Parallel Computing Toolbox KCOUNT, OptimStation, PRTCS 400 Maybe
Deep Learning Toolbox Emission 500 No
Global Optimization Toolbox OptimStation 400 No
Financial Toolbox Emission 740 No (maybe Arman can just send it to us)
Image Processing Toolbox export_fig 400 No
aharpalaniTO commented 5 years ago

image

Unfortunately for the POA it looks like we're going to need the same process as for C4C (briefing note + division head and PMMD Director signatures). There must be an easier way.

cczhu commented 5 years ago

Going back to running the executable, Arman noted that we need to add R to the path environmental variable in order to run Rscript (LocalSVR/codes/build_infile_SVR.m`). I added this.

Then tried to run only PRTCS. This was successful:

image

The File not found: run KCOUNT for both directiosn and check working path comes from Emission/codes/pos_neg_sum.m, and comes from not running KCOUNT (there's an equivalent error for not running LocalSVR).

Running both PRTCS and KCOUNT leads to "Error: Model need revision!".

cczhu commented 5 years ago

Moving back to the Matlab 2016 environment, now with all the relevant add-ons listed here, we can successfully run PRTCS:

image

Running KCOUNT afterward leads to:

Error using calcVarMat (line 19)
Index exceeds matrix dimensions.

Error in main_2_2012_min (line 377)
varmat = calcVarMat(kDist, residv);

Error in main_combined_2 (line 60)
                    main_2_2012_min(str2,direction{idir},strcat(path.Value,'KCOUNT\RMsma_2km_neg\'),base_year,lam,bins);

Error in App2/EstimateAADTsButtonPushed (line 322)
                main_combined_2(app.WorkingDirectoryEditField,option, ...

Error in appdesigner.internal.service.AppManagementService/tryCallback (line 207)
                    callback(app, event);

Error in matlab.apps.AppBase>@(source,event)tryCallback(appdesigner.internal.service.AppManagementService.instance(),app,callback,requiresEventData,event)

Error using matlab.ui.control.internal.controller.ComponentController/executeUserCallback (line 262)
Error while evaluating Button PrivateButtonPushedFcn

This error comes from a mismatch between the shape of kDist, 5104 x 5104, and residv, 4982 x 1. From reading main_2_2012_min.m the former is loaded by the line in preProcDist_new.m

kIDs1 = csvread([path name '_obs_' num2str(base_year) '.txt'])

while the latter is populated from kMat2, which is read from

kMat2=csvread(strcat(path,'data_for_fit',num2str(base_year),'.txt'),1);

(At least for 2011) the former is updated (PRTCS/output_for_kriging2011negative/data_for_fit2011.txt, PRTCS/output_for_kriging2011positive/data_for_fit2011.txt and their copies under KCOUNT/RMsma_2km_neg and KCOUNT/RMsma_2km_pos, respectively). The latter (nominally PRTCS/output_for_kriging2011negative/ids_obs_2011.txt and PRTCS/output_for_kriging2011positive/ids_obs_2011.txt) is not. Looks like we have to manually trigger ishort_krig.Value, which is toggled by the app.ShortestpathreanalysisCheckBox boolean, which is passed to main_combined_2 by the app.

cczhu commented 5 years ago

TEPs-I raw volume data:

SELECT * FROM prj_volume.uoft_centreline_volumes_output
LIMIT 100
cczhu commented 5 years ago

Can now run PRTCS, KCOUNT and LocalSVR in sequence! See GUI panel for settings. image

cczhu commented 5 years ago

Currently can also run PECOUNT-I and PECOUNT-II, but uncertain of which stations Arman flags as Aggregate Stations, and which medium term counts he selects as downstream stations for each aggregate one.

We do know (but haven't run) that the CSV dump of prj_volume.uoft_centreline_volumes_output is parsed into individual station counts using the UNIX script Arman included in the TEPs Manual appendix. This should be fairly straightforward to reproduce using Python (or even Postgres).

cczhu commented 5 years ago

~~Close examination of the output of PRTCS suggests a bug: for output_PRTCS_2011_negative (and positive), permanent station AADTs are found in Perm_AADT_2011.txt and temporary ones in Temp_AADT_2011.txt. Station ID is the centreline ID extracted from uoft_centreline_volumes_output. IDs in Perm_AADT_2011.txt are found in Temp_AADT_2011.txt. Permanent and temporary stations are distinguished in lines 161-178 of PRTCS/codes/main_DoM_new_2012.m - all stations are under Ms_abs, while temporary and permanent stations should be separated into DoM_PTC and DoM_STTC, respectively. However, the number of unique centreline IDs in Ms_abs(:,4) is identical to the number under DoM_STTC(:,5) (or MSE(:,2), which is copied from DoM_STTC(:,5)).

Since this bug was only discovered by a close inspection of the code, but a close inspection was also required to fully understand the columns being output by each TEPs module, this suggests we should be converting each module in sequence (starting with PRTCS) rather than using a top-down approach of building the entire test suite out first before beginning the conversion process. This way we can read the code in detail and find any bugs before moving onto the next step. We can still create canned inputs using either Arman's original code or my revised one to feed into another module of TEPs to create test outputs.~~

Update: this is probably just the way Arman names things. As far as I can tell whether a station is permanent is checked by its centreline ID and year within the code, and in some cases permanent count stations are deliberately duplicated across multiple files or variables so they can be used for validation or error estimates.

In any case I've found far worse issues with PRTCS.

cczhu commented 5 years ago

Annoying find - PRTCS/codes/data_prep_kridging.m includes land use data, station counts, and possibly has a different definition of which stations have AADTs (line 161). I'm not certain why it's even part of PRTCS, since it data_prep_kridging.m doesn't appear to ingest any data from PRTCS/output_PRTCS_<YEAR>_<DIRECTION>. Perhaps there's an undocumented intermediate step that creates shortest_path.zip which also incorporates the station count data?

Update: this isn't quite true - the land use data was determined separately, but the AADT estimates do come from PRTCS. See the flow chart and description in the wiki.

cczhu commented 5 years ago

Following discussion with Jesse and Aakash:

cczhu commented 5 years ago

With Arman's assistance in person, discovered that I was missing some files under KCOUNT\RMsma_2km_pos (that is on OneDrive, just not downloaded locally). Some notes from our meeting:

image

Action items for me:

cczhu commented 5 years ago

The above run has successfully completed, and we've generated a new set of final_aadt_<YEAR>.csv files. They're not identical to the ones from Arman's OneDrive (in some cases even exceeding the 95% CI estimates provided). I've contacted Arman about this and will update this issue once he responds.

cczhu commented 4 years ago

Odd issue I discovered while running TEPS-dev - some of the PTC IDs in validation_2010.txt files produced by TEPS-exerun are not in TEPS-dev. The missing counts are not included in test_id_negative and test_id_positive in STTC_estimate3.m, so weren't eliminated by data preprocessing. I can't tell if this is an issue with the executable in TEPS-exerun being different than the source code, but at this stage of our development it's probably not fruitful to investigate.

cczhu commented 4 years ago

Further investigation reveals that this is because TEPS-dev was running on 2006-2013 data, and TEPS-exerun was running on 2006-2016. The additional PTCs from 2013-2016 lead to outlier points on the Observed-Predicted plot:

image

This lends further credence to the discussion in #14 on validation issues.

cczhu commented 4 years ago

A couple of notes on how to run TEPs learned while completing #41: