CityofToronto / bdit_traffic_prophet

Suite of algorithms for predicting average daily traffic on Toronto streets
GNU General Public License v3.0
1 stars 1 forks source link

Separate out processing permanent counts from countmatch.reader #32

Closed cczhu closed 4 years ago

cczhu commented 4 years ago

Based on comments in #25 and discussions with @aharpalaniTO, we'll be revising how we treat permanent counts. To do this, we can no longer identify and process permanent count locations while reading in data year-by-year, since data imputation and outlier detection for permanent count stations requires data from other years.

To resolve this:

cczhu commented 4 years ago

Had a rethink of this task, and now realize why Arman was reluctant to start mucking around with data imputation of permanent count stations, especially if he already had PECOUNT working to fill in missing ~2-3 months of data.

cczhu commented 4 years ago

Revised class design:

Revised work plan:

Outstanding question: if DerivedVals and GrowthFactor classes have lots of method-specific flags, how should we control for these in config.py? Nested dicts?

cczhu commented 4 years ago

As I'm refactoring to allow for imputation and other multi-year preprocessing of permanent counts, I've created a design where all data from a count location across multiple years are stored in the same object. The amount of data taken at a location can drastically change from year to year, and the permanent count criteria are defined in terms of a single year's data, so currently each location's data is checked year by year using PermCountProcessor.partition_years. The output of this function is perm_years, a list of years that meet the permanent count criteria. Once a PermCount object is created for the count location, perm_years is stored within it, and is used in a number of subsequent methods for calculating derived values and growth factors.

There are a number of issues with this design:

Alternative design:

cczhu commented 4 years ago

Decided to adopt points 1 and 3, but not 2. perm_years is now an instance property, but it is still initialized by PermCountProcessor. This is because practically everything else is initialized by PermCountProcessor (all derived values and growth factors, for example), so it would be strange to use an entirely different pattern to attempt initialization of perm_years, especially since it needs to communicate back to its parent whether it succeeded in making a permanent count. We can always refactor later.

cczhu commented 4 years ago

Resolved by #37