Closed tomschenkjr closed 6 years ago
For this, I think the proper resolution is to reframe the discussion to clarify the context of prior-day nowcast model versus hybrid now-cast modeling. The approach to our paper was (1) point-out that all predictive models use a combination of weather/hydrometerorological data and prior day lab tests (2) propose a new model form using qPCR and inter-beach correlations and (3) compare the model performance using hybrid model versus prior-day model at Chicago's beaches.
Here are some ideas for a reworked discussion section
Models attempting to forecast FIB levels in beaches have essentially used the same functional form. Lab data from the previous day is combined with various predictors in order to attempt to predict whether FIB levels will exceed the suggested thresholds. Innovations have occurred by finding novel ways to collect the predictors, such as hydrometerological sensors, that improve accuracy and save time. Likewise, more sophisticated algorithms, such as machine learning and genetic algorithms, have been used to improve performance.
Yet, the concept of these models still remain the same by relying on prior-day laboratory results, which we've dubbed the "prior-day nowcast model". Evidence suggests that the contributors to creating FIB do not persist from day-to-day (citations). That can explain why many attempts to predict FIB levels in beaches are relatively low. Despite improvements to analytical models, those models are still dependent on day-old FIB data.
Previous research has found that FIB levels in Chicago's beaches are highly correlated (citation) and Chicago beaches rarely encounter consecutive days of elevated FIB levels. At the same time, qPCR testing has become more widely used, but is still expensive. Because qPCR testing provides immediate results, we proposed the hybrid nowcast model to use limited qPCR data to predict FIB levels in other beaches.
Hyrbrid nowcast model removes the dependency on day-old FIB information that is commonly used in other models. This approach more closely resembles a "missing data" problem, where we are attempting to "fill-in" the missing values (beaches without qPCR testing). For beach networks that are highly correlated, like Chicago's, hybrid nowcasting was able to increase model sensitivity without increase the rate of false positives.
Hybrid modeling uses a different approach. By identifying "clusters" of beaches, we exploit the inter-beach correlation to formulate a prediction. While this model used a random forest, the analytical model could be adjusted to use other approaches, such as genetic algorithms. Likewise, we clustered beaches using a basic k-means algorithm, but other methods can also be used. In either case, it seems that a significant improvement comes from shifting away from prior-day lab results.
Second, the selected qPCR testing was tactically chosen for beaches with higher rates of exceeding acceptable FIB levels. This helps reduce the variance needed to be explained by the model...
This is great!
I'll add that qPCR was also chosen for beaches known in prior literature and/or found in our own study to have little predictive value, likely due to idiosyncratic geographical features. These beaches had higher rates of exceedances, which provided another tactical benefit. And by isolating beaches whose individual characteristics tend to contribute to outlier FIB levels, we were able to build a model excluding those beaches and only including beaches whose FIB levels tend toward the regional mean.