Closed cczhu closed 3 years ago
After acquiring MATLAB 2016r2, and installing Gnuplot 5.2.2 and R 3.6.1 (and RStudio), can now run access the TEPs-I GUI under KCOUNT/codes/App2.mlapp
. Not sure if it's supposed to be here. More worryingly, attempting to run the model using the GUI with the default settings (except Working Directory, which is C:\Users\czhu5\Documents\VolumeModel\TEPS-dev\
) leads to an error message saying that nansum is missing:
Error using STTC_estimate3 (line 116)
Undefined function 'nansum' for input arguments of type 'double'.
Error in main_DoM_new_2012 (line 30)
parfor iyear=Start_year:End_year
Error in main_combined_2 (line 50)
main_DoM_new_2012(str1,direction{idir},FY.Value,EY.Value,base_year,allyearindex.Value,ishort_krig);
Error in App2/EstimateAADTsButtonPushed (line 322)
main_combined_2(app.WorkingDirectoryEditField,option, ...
Error in appdesigner.internal.service.AppManagementService/tryCallback (line 207)
callback(app, event);
Error in
matlab.apps.AppBase>@(source,event)tryCallback(appdesigner.internal.service.AppManagementService.instance(),app,callback,requiresEventData,event)
Error using matlab.ui.control.internal.controller.ComponentController/executeUserCallback (line 262)
Error while evaluating Button PrivateButtonPushedFcn
nansum
is part of the Statistics and Machine Learning add-on to MATLAB, which costs $1000 USD...
Initialized a local WSL git repo under TEPS-dev
to track any changes I make to the code. Will add various files to the .gitignore as needed.
Jan Glscher uploaded a version of nansum in his NaN Suite. Imported this suite into charles/nansuite
and appended the path under main_combined_2.m
. Result:
Error using STTC_estimate3 (line 125)
Subscripted assignment dimension mismatch.
Error in main_DoM_new_2012 (line 30)
parfor iyear=Start_year:End_year
Error in main_combined_2 (line 53)
main_DoM_new_2012(str1,direction{idir},FY.Value,EY.Value,base_year,allyearindex.Value,ishort_krig);
Error in App2/EstimateAADTsButtonPushed (line 322)
main_combined_2(app.WorkingDirectoryEditField,option, ...
Error in appdesigner.internal.service.AppManagementService/tryCallback (line 207)
callback(app, event);
Error in
matlab.apps.AppBase>@(source,event)tryCallback(appdesigner.internal.service.AppManagementService.instance(),app,callback,requiresEventData,event)
Error using matlab.ui.control.internal.controller.ComponentController/executeUserCallback (line 262)
Error while evaluating Button PrivateButtonPushedFcn
which suggests we can't just replace whatever version of nansum
Arman used with this one.
Different tack - try running TEPs-I's executable using Matlab 2016b (Update 6) runtime downloaded from MathWorks. Created TEPS-exerun
to run this, since I don't know what intermediate files will be created by the executable that will ruin my testing from above. Will run the exe overnight using all default settings (since the exe is in the root directory I don't even have to change the working directory).
TEPS-exerun
is successfully able to return a series of diagnostic graphs from PRTCS:
but soon after also plotting the KCOUNT diagnostic figures (manual Figs. C-1 - C-3) suddenly closes all figures and returns this:
Not sure what version of TEPs-I the exe is built from, since the only uncommented msgbox('Error: Model need revision!')
in App2.mlapp
is under function ARIMAButtonPushed
, used for PECOUNT-I.
Deployed a temporary GitHub repo to house TEPs-I's original source code, mainly as a backup for local files and to communicate with Arman as needed. I'm NOT including it under bdit_teps
because:
2021-07-13 update - I deleted the temporary repo, to avoid any confusion from future GitHub spelunkers. Copies of TEPS are available on L: drive.
There's a program dependency report generator! Dumped reports in PDF form into DependencyReports
folder.
Here are all the functions and objects that could not be identified by MATLAB's dependency reports.
boxcox
- Financial Toolbox; Box-Cox transformationgregnet2b
- included as a .matnanmean
- Statistics and Machine Learning Toolbox; mean, remove NaN firstnansum
- Statistics and Machine Learning Toolbox; sum, remove NaN firstnewff
- Deep Learning Toolbox; deprecated version of feedforwardnet
plotregression
- Deep Learning Toolbox; plot linear regressionregstats
- Statistics and Machine Learning Toolbox; regression diagnosticssim
- Deep Learning Toolbox; simulate NNtrain
- Deep Learning Toolbox; train NNimresize
: Image Processing Toolbox; resizes imagecholcov
: Statistics and Machine Learning Toolbox; Cholesky-like covariance
decompositioncorr
: Statistics and Machine Learning Toolbox; pairwise linear correlation coefficientdataset
: Statistics and Machine Learning Toolbox; construct dataset arraydummyvar
: Statistics and Machine Learning Toolbox; create dummy variablefitlm
: Statistics and Machine Learning Toolbox; fit linear regression model
to dataset arraygcp
: Parallel Computing Toolboxgrpstats
: Statistics and Machine Learning Toolbox; summary statistics organized by groupmat2dataset
: Statistics and Machine Learning Toolbox; convert matrix to
datasetnansum
- Statistics and Machine Learning Toolbox; sum, remove NaN firstnominal
: Statistics and Machine Learning Toolbox; discrete, nonumeric valuespredict
: Statistics and Machine Learning Toolbox; predict with fitlm
pDist
: UNKNOWN! (Though part of find_ids_for_pred
, which appears to be
unused)quantile
: Statistics and Machine Learning Toolbox; quantiles of a data setweights
: UNKNOWN!All good.
corr
: Statistics and Machine Learning Toolbox; pairwise linear correlation coefficientdataset
: Statistics and Machine Learning Toolbox; construct dataset arraydummyvar
: Statistics and Machine Learning Toolbox; create dummy variablefitscalingprop
: Global Optimization Toolbox; genetic algorithm optionfitlm
: Statistics and Machine Learning Toolbox; fit linear regression model
to dataset arrayfminsearchbnd
: UNKNOWN! (Though the documentation says it'll default to
fminsearch
, which we do have)gaplotbestf
: Global Optimization Toolbox; genetic algorithm best fit plotgrpstats
: Statistics and Machine Learning Toolbox; summary statistics organized by groupmat2dataset
: Statistics and Machine Learning Toolbox; convert matrix to
datasetnansum
- Statistics and Machine Learning Toolbox; sum, remove NaN firstnominal
: Statistics and Machine Learning Toolbox; discrete, nonumeric valuesoptimoptions
: Global Optimization Toolbox; genetic algorithm optionsparcluster
: Parallel Computing Toolboxparpool
: Parallel Computing Toolboxpredict
: Statistics and Machine Learning Toolbox; predict with fitlm
pDist
: UNKNOWN!quantile
: Statistics and Machine Learning Toolbox; quantiles of a data setsaveProfile
: Parallel Computing Toolboxselectionstochasticuniform
: Global Optimization Toolbox; genetic algorithm
optionselectiontournament
: Global Optimization Toolbox; genetic algorithm
optionselectionuniform
: Global Optimization Toolbox; genetic algorithm
optionAll good.
cleanUpUrl
- UNKNOWN!corr
: Statistics and Machine Learning Toolbox; pairwise linear correlation coefficientgcp
: Parallel Computing Toolboxnanmean
- Statistics and Machine Learning Toolbox; mean, remove NaN firstnansum
- Statistics and Machine Learning Toolbox; sum, remove NaN firstFrom this it looks like there's no way we can run the Emission or
OptimStation modules without the Global Optimization, Financial, Deep Learning
and Statistics/ML Toolboxes, though we weren't planning on doing that anyway.
We cannot run KCOUNT without the Statistics/ML Toolbox, though. We could
probably run PRTCS if Arman could dump corr
, nanmean
and nansum
and send
them over.
The UNKNOWN!
s will need to be investigated further.
Name | Used In | Cost (USD for Annual License) | Necessary? |
---|---|---|---|
Statistics and Machine Learning Toolbox | Emission, KCOUNT, OptimStation, PRTCS | 400 | Yes |
Parallel Computing Toolbox | KCOUNT, OptimStation, PRTCS | 400 | Maybe |
Deep Learning Toolbox | Emission | 500 | No |
Global Optimization Toolbox | OptimStation | 400 | No |
Financial Toolbox | Emission | 740 | No (maybe Arman can just send it to us) |
Image Processing Toolbox | export_fig | 400 | No |
Unfortunately for the POA it looks like we're going to need the same process as for C4C (briefing note + division head and PMMD Director signatures). There must be an easier way.
Going back to running the executable, Arman noted that we need to add R to the path environmental variable in order to run Rscript
(LocalSVR/codes/build_infile_SVR.m`). I added this.
Then tried to run only PRTCS. This was successful:
The File not found: run KCOUNT for both directiosn and check working path
comes from Emission/codes/pos_neg_sum.m
, and comes from not running KCOUNT (there's an equivalent error for not running LocalSVR).
Running both PRTCS and KCOUNT leads to "Error: Model need revision!".
Moving back to the Matlab 2016 environment, now with all the relevant add-ons listed here, we can successfully run PRTCS:
Running KCOUNT afterward leads to:
Error using calcVarMat (line 19)
Index exceeds matrix dimensions.
Error in main_2_2012_min (line 377)
varmat = calcVarMat(kDist, residv);
Error in main_combined_2 (line 60)
main_2_2012_min(str2,direction{idir},strcat(path.Value,'KCOUNT\RMsma_2km_neg\'),base_year,lam,bins);
Error in App2/EstimateAADTsButtonPushed (line 322)
main_combined_2(app.WorkingDirectoryEditField,option, ...
Error in appdesigner.internal.service.AppManagementService/tryCallback (line 207)
callback(app, event);
Error in matlab.apps.AppBase>@(source,event)tryCallback(appdesigner.internal.service.AppManagementService.instance(),app,callback,requiresEventData,event)
Error using matlab.ui.control.internal.controller.ComponentController/executeUserCallback (line 262)
Error while evaluating Button PrivateButtonPushedFcn
This error comes from a mismatch between the shape of kDist
, 5104 x 5104, and residv
, 4982 x 1. From reading main_2_2012_min.m
the former is loaded by the line in preProcDist_new.m
kIDs1 = csvread([path name '_obs_' num2str(base_year) '.txt'])
while the latter is populated from kMat2
, which is read from
kMat2=csvread(strcat(path,'data_for_fit',num2str(base_year),'.txt'),1);
(At least for 2011) the former is updated (PRTCS/output_for_kriging2011negative/data_for_fit2011.txt
, PRTCS/output_for_kriging2011positive/data_for_fit2011.txt
and their copies under KCOUNT/RMsma_2km_neg
and KCOUNT/RMsma_2km_pos
, respectively). The latter (nominally PRTCS/output_for_kriging2011negative/ids_obs_2011.txt
and PRTCS/output_for_kriging2011positive/ids_obs_2011.txt
) is not. Looks like we have to manually trigger ishort_krig.Value
, which is toggled by the app.ShortestpathreanalysisCheckBox
boolean, which is passed to main_combined_2
by the app.
TEPs-I raw volume data:
SELECT * FROM prj_volume.uoft_centreline_volumes_output
LIMIT 100
Can now run PRTCS, KCOUNT and LocalSVR in sequence! See GUI panel for settings.
Currently can also run PECOUNT-I and PECOUNT-II, but uncertain of which stations Arman flags as Aggregate Stations, and which medium term counts he selects as downstream stations for each aggregate one.
We do know (but haven't run) that the CSV dump of prj_volume.uoft_centreline_volumes_output
is parsed into individual station counts using the UNIX script Arman included in the TEPs Manual appendix. This should be fairly straightforward to reproduce using Python (or even Postgres).
~~Close examination of the output of PRTCS suggests a bug: for output_PRTCS_2011_negative
(and positive), permanent station AADTs are found in Perm_AADT_2011.txt
and temporary ones in Temp_AADT_2011.txt
. Station ID is the centreline ID extracted from uoft_centreline_volumes_output
. IDs in Perm_AADT_2011.txt
are found in Temp_AADT_2011.txt
. Permanent and temporary stations are distinguished in lines 161-178 of PRTCS/codes/main_DoM_new_2012.m
- all stations are under Ms_abs
, while temporary and permanent stations should be separated into DoM_PTC
and DoM_STTC
, respectively. However, the number of unique centreline IDs in Ms_abs(:,4)
is identical to the number under DoM_STTC(:,5)
(or MSE(:,2)
, which is copied from DoM_STTC(:,5)
).
Since this bug was only discovered by a close inspection of the code, but a close inspection was also required to fully understand the columns being output by each TEPs module, this suggests we should be converting each module in sequence (starting with PRTCS) rather than using a top-down approach of building the entire test suite out first before beginning the conversion process. This way we can read the code in detail and find any bugs before moving onto the next step. We can still create canned inputs using either Arman's original code or my revised one to feed into another module of TEPs to create test outputs.~~
Update: this is probably just the way Arman names things. As far as I can tell whether a station is permanent is checked by its centreline ID and year within the code, and in some cases permanent count stations are deliberately duplicated across multiple files or variables so they can be used for validation or error estimates.
In any case I've found far worse issues with PRTCS.
Annoying find - PRTCS/codes/data_prep_kridging.m
includes land use data, station counts, and possibly has a different definition of which stations have AADTs (line 161). I'm not certain why it's even part of PRTCS, since it data_prep_kridging.m
doesn't appear to ingest any data from PRTCS/output_PRTCS_<YEAR>_<DIRECTION>
. Perhaps there's an undocumented intermediate step that creates shortest_path.zip
which also incorporates the station count data?
Update: this isn't quite true - the land use data was determined separately, but the AADT estimates do come from PRTCS. See the flow chart and description in the wiki.
Following discussion with Jesse and Aakash:
aadt_output_files/final_aadt_<YEAR>.csv
to acceptable accuracy, starting from the PRTCS/<POS_OR_NEG>/15min_counts<YEAR>.zip
files.With Arman's assistance in person, discovered that I was missing some files under KCOUNT\RMsma_2km_pos
(that is on OneDrive, just not downloaded locally). Some notes from our meeting:
PRTCS/<POS_OR_NEG>/15min_counts<YEAR>.zip
folders - these are created simply by dividing up our raw data. Arman's tested the module's predictive accuracy (see his TRC_2018_1199 paper), but hasn't tried running the remainder of TEPs with augmented data.15min_counts<YEAR>.zip
files have a dummy number in their filenames because Arman was, for another project, working with multiple sensors per centreline segment. Likewise the first column of each txt file within each zip is a dummy column from the Unix shell script.re<CENTRELINE>_<DUMMY>.txt
inside 15 min count data comes from HW401. Except for the difference in filename they're formatted in the same way as the other files.max_counts
and min_counts
, set in lines 113 - 296 of KCOUNT/codes/main_2_2012_min.m
, are not used by KCOUNT, and so can be ignored. They were originally limiters on the number of counts per road type.data_prep_kriging.m
over e-mail.shortest_path.zip
, but is willing to help us do this.Action items for me:
The above run has successfully completed, and we've generated a new set of final_aadt_<YEAR>.csv
files. They're not identical to the ones from Arman's OneDrive (in some cases even exceeding the 95% CI estimates provided). I've contacted Arman about this and will update this issue once he responds.
Odd issue I discovered while running TEPS-dev - some of the PTC IDs in validation_2010.txt
files produced by TEPS-exerun are not in TEPS-dev. The missing counts are not included in test_id_negative
and test_id_positive
in STTC_estimate3.m
, so weren't eliminated by data preprocessing. I can't tell if this is an issue with the executable in TEPS-exerun being different than the source code, but at this stage of our development it's probably not fruitful to investigate.
Further investigation reveals that this is because TEPS-dev was running on 2006-2013 data, and TEPS-exerun was running on 2006-2016. The additional PTCs from 2013-2016 lead to outlier points on the Observed-Predicted plot:
This lends further credence to the discussion in #14 on validation issues.
A couple of notes on how to run TEPs learned while completing #41:
main_combined_2
code in the backend. "Estimate AADTs" explicitly unchecks the "Vehicle speed" box, while "Estimate Vehicle speed" unchecks the "PRTCS", "KCOUNT", and "LocalSVR" boxes.Emission/inputs/*.csv
and Emission/outputs/*.csv
into EED/inputs/
and EED/outputs/
folders.
Run TEPs-I to predict volumes, and retrain TEPs-I to see if we can reproduce the same predictions.
MATLAB should not be required here.