CDCgov / wastewater-informed-covid-forecasting

Wastewater-informed COVID-19 forecasting models submitted to the COVID-19 Forecast Hub
https://cdcgov.github.io/wastewater-informed-covid-forecasting/
Apache License 2.0
38 stars 6 forks source link

Refactor datapoint exclusion logic to use `anti_join` #30

Open kaitejohnson opened 1 week ago

kaitejohnson commented 1 week ago

Problem

Current implementation for excluding specific hospital admissions data points loops through locations and checks for exclusions. We should instead use anti_join because we are passing a table of exclusions.

Requirements

Context

Not for this PR but it would be good to refactor the exclusion logic to avoid having to go one location at a time and check carefully for multiple locations, etc (and the check is important because otherwise non-target datapoints could be filtered out).

Given that you have a table of exclusions, I'd suggest writing this function so that it can work on multiple locations at once (though you can still always go one location at a time if you prefer. An anti_join on location and date should work:

https://dplyr.tidyverse.org/reference/filter-joins.html