Pythonize PRTCS - Githubissues

cczhu commented 4 years ago

Given the rather limited number of weeks we have to have some version of the volume model running, our priority must now be to create Python versions of PRTCS, KCOUNT and LSVR. Of these, PRTCS is the only that uses entirely novel algorithms.

To create a Python version of PRTCS, we need to:

[x] Reproduce and simplify the behaviour of STTC_estimate3.m preprocessing.
[ ] Reproduce the growth rate estimating processes in PTCWEEK.m and PTCYEAR.m.
[ ] Reproduce the Delauney triangulation scheme in nearestneighbour.m for linking short-term and permanent count stations.
[ ] Reproduce the main main_DoM_new_2012.m code.

cczhu commented 4 years ago

PRTCS will now be called countmatch.

Created a small function to read 15_min_counts_<YEAR>.zip into pandas tables. Experimented a bit with optimizing timestamp conversion times with Pendulum and CISO8601, and found that pd.read_csv with infer_datetime_format=True is as fast or faster than using either package with pd.DataFrame.apply. Will write up a short notebook about this at some point.

Using infer_datetime_format=True should speed up zip file reading by >20x.

cczhu commented 4 years ago

This is far too large a single issue, so per @aharpalaniTO 's suggestion have split up the remaining work into issues #11 onward.

CityofToronto / bdit_traffic_prophet

Pythonize PRTCS #8