As of #22, Traffic Prophet has a 'min_counts_in_day' setting for the smallest number of counts in a day for that day to be included in daily_counts. This is used when reading data from zip files, but because the Postgres materialized views we're working with already aggregate to daily counts, this minimum count number is hardcoded in the view scripts and setting 'min_counts_in_day' cannot affect this.
Potential solutions
At the very least this issue should be included in the documentation.
We could add an n_bins column to the Postgres views, which would allow selecting on a minimum bin coverage. This might further complicate the ETL pipeline, but should be considered when we refactor countmatch.reader.
As of #22, Traffic Prophet has a
'min_counts_in_day'
setting for the smallest number of counts in a day for that day to be included indaily_counts
. This is used when reading data from zip files, but because the Postgres materialized views we're working with already aggregate to daily counts, this minimum count number is hardcoded in the view scripts and setting'min_counts_in_day'
cannot affect this.Potential solutions
n_bins
column to the Postgres views, which would allow selecting on a minimum bin coverage. This might further complicate the ETL pipeline, but should be considered when we refactorcountmatch.reader
.