Chicago / west-nile-virus-predictions

Algorithm to predict repeated positive results for West Nile Virus for mosquitoes captured in traps across Chicago.
MIT License
14 stars 1 forks source link

Issue with date regularization, should be beginning of week not end of week #47

Closed geneorama closed 7 years ago

geneorama commented 7 years ago

Today I noticed when looking for the most recent data in WindyGrid that the most recent week is missing. This is caused by missing values artificially created because I inadvertently was putting some dates into the future when calculating the week.

This only affects the data for a couple of days when the data is new, which is why I didn't notice it earlier in the season (there was lag in reporting when I was first testing the data pipeline).

For example, for week 34 the actual collection date was 2017-08-24, but the date being used in the model was 2017-08-28 (the end of the week). After this change the date being used in the model will be 2017-08-21.

Of course I would prefer to use the actual collection date, but the issue is that in many weeks there are multiple collection dates. So, if I aggregated the data by week and date, many weeks would split into two observations. So the calculated date is mostly for convenience and for attaching historical weather data.

This change does not appear to affect model performance.