Open sbfnk opened 7 months ago
My thought on how that would work is to have a new variable (accumulate) that indicates which days should be summed.
Another option would be to distinguish between explicit (date exists in the data, value is NA meaning missing) vs. implicit (date doesn't exist in the data meaning accumulate) NAs which might be easier preprocessing if potentially easier to inadvertently get wrong.
yeah potentially but also think that could be a bit dangerous. I would instead suggest making a helper function that maps from that structure to the less dangerous explicit version for those that clearly want that.
and if going that way I'd suggest that becomes a dependent issue
With appropriate warnings messages as suggested in #771 then I think this is the best option as it can take all information from a 2-column data frame as before:
distinguish between explicit (date exists in the data, value is NA meaning missing) vs. implicit (date doesn't exist in the data meaning accumulate) NAs
I don't think I agree. I think there should be one way of handling missing data (as missing) and it can throw a warning if creating missing dates saying what it is doing.
I think overloading NAs like we have done for accumulation is confusing and dangerous and would much prefer a separate feature describing this.
Something I think we want to be aware of is non-standard schemes. These could be 1. Non-constant reporting and 2. repeated reporting (some counts are reported twice as aggregates of different dates).
I haven't really seen the latter and its quite an edge case so its unclear to me if we really want to support it or not.
I'm open to suggestions and acknowledge there are dangers in overloading interpretations. My ideal would be one in which it's fairly straightforward (and safe) to handle the common cases of daily/weekly data on incidence/prevalence and missingness that could correspond to zeroes or missed reports.
With appropriate warnings messages as suggested in #771 then I think this is the best option as it can take all information from a 2-column data frame as before:
distinguish between explicit (date exists in the data, value is NA meaning missing) vs. implicit (date doesn't exist in the data meaning accumulate) NAs
This can now be checked with the test_data_complete()
function introduced in #774, when merged.
I think this would be my preferred option as it would be more general but I also think it can be addressed in its own review as it would be a superset of this PR.
My thought on how that would work is to have a new variable (
accumulate
) that indicates which days should be summed.Originally posted by @seabbs in https://github.com/epiforecasts/EpiNow2/pull/534#pullrequestreview-1880071190