Chicago / west-nile-virus-predictions

Algorithm to predict repeated positive results for West Nile Virus for mosquitoes captured in traps across Chicago.
MIT License
14 stars 1 forks source link

New bug because of missing values in NOAA data #41

Closed geneorama closed 7 years ago

geneorama commented 7 years ago

Often in the NOAA daily summaries there are missing values. Usually the maximum temperature is a good value to check because that field is the most basic information about the historical weather, so it's a good proxy for "is this row missing".

Yesterday the minimum temperature was missing even though the max temperature was present. The missing values cause an error in the QR decomposition that happens behind the scenes when testing for linear combinations in the data.

So I'm updating the code to test for missing values in all the fields we use, or fix this some other way.

geneorama commented 7 years ago

Found the real source of the problem: when updating the daily summaries with hourly summaries I wasn't calculating fields that we don't use... and so that was why some of the fields in the linear combo test were NA.

Updated the code to only test for linear combos for dates prior to missing_dates, which is fine because the purpose is only to get a sense for "are these values highly related historically". We don't need every last data point to do that test.