epiverse-trace / cfr

R package to estimate disease severity and under-reporting in real-time, accounting for reporting delays in epidemic time-series
https://epiverse-trace.github.io/cfr/
Other
13 stars 3 forks source link

Expected outcomes lower than number of deaths #165

Open CarmenTamayo opened 1 month ago

CarmenTamayo commented 1 month ago

One of the use cases for the reporting guidance paper (https://github.com/joshwlambert/epiparameterReportingGuidance/blob/cfr-truncation/inst/use_cases/cfr-truncation.R) uses cfr to compare the impact of using delay adjusted vs unadjusted CFRs in further analyses. For this purpose, the data included in the package (Ebola from 1976) was truncated, specifically on 1976-09-30.

When running the function cfr_static, using an onset-death from the literature (Barry et al., 2018), as well as with delays from the {epiparameter} library, we get the following message:

Total deaths = 131 and expected outcomes = 126 so setting expected outcomes = NA. If we were to assume total deaths = expected outcomes, it would produce an estimate of 1.

This specific cut-off date is towards the end of the outbreak and past its peak, therefore we'd expect to know a large proportion of the outcomes, and where the true cfr and delay adjusted cfr would be converging. The CFR at the end of the outbreak is 0.95, and the naive estimate at the cutoff date is 0.74.

I imagine the adjusted CFR > 1 is due to the case ascertainment in the 1976 data being low (CFR 0.95 vs estimates from Barry et al CFR 0.56 ), and therefore when using a delay, especially if it's on the "longer" side (in this case the mean is 8 days), it can be that the expected outcomes are indeed lower than the no. of deaths

I was wondering if there's cases where this could be due to the wrong delay distribution being used, and also suggest that the warning message is more informative and indicative of why this might be happening so that the user can better understand if it's due to a mistake or due to the characteristics of their data

tagging @adamkucharski for suggestions and @joshwlambert as part of the group where this topic was originally discussed

adamkucharski commented 1 month ago

Thanks for raising. This is related to the following issue, which proposes a statistical valid (but much more computationally intensive) solution for situations where CFR is near 1 and delay distribution long, so occasionally the number of deaths by chance is larger than E(known outcomes):

Have held off on implementing for now, because Ebola in 1976 is an extreme example given CFR very high. Not necessarily because of underascertainment – initial cases were via infected syringes and the rural location would have limited treatment options (see Camacho et al, 2014 for more).