Open CarmenTamayo opened 1 month ago
Thanks for raising. This is related to the following issue, which proposes a statistical valid (but much more computationally intensive) solution for situations where CFR is near 1 and delay distribution long, so occasionally the number of deaths by chance is larger than E(known outcomes):
Have held off on implementing for now, because Ebola in 1976 is an extreme example given CFR very high. Not necessarily because of underascertainment – initial cases were via infected syringes and the rural location would have limited treatment options (see Camacho et al, 2014 for more).
One of the use cases for the reporting guidance paper (https://github.com/joshwlambert/epiparameterReportingGuidance/blob/cfr-truncation/inst/use_cases/cfr-truncation.R) uses
cfr
to compare the impact of using delay adjusted vs unadjusted CFRs in further analyses. For this purpose, the data included in the package (Ebola from 1976) was truncated, specifically on 1976-09-30.When running the function cfr_static, using an onset-death from the literature (Barry et al., 2018), as well as with delays from the {epiparameter} library, we get the following message:
Total deaths = 131 and expected outcomes = 126 so setting expected outcomes = NA. If we were to assume total deaths = expected outcomes, it would produce an estimate of 1.
This specific cut-off date is towards the end of the outbreak and past its peak, therefore we'd expect to know a large proportion of the outcomes, and where the true cfr and delay adjusted cfr would be converging. The CFR at the end of the outbreak is 0.95, and the naive estimate at the cutoff date is 0.74.
I imagine the adjusted CFR > 1 is due to the case ascertainment in the 1976 data being low (CFR 0.95 vs estimates from Barry et al CFR 0.56 ), and therefore when using a delay, especially if it's on the "longer" side (in this case the mean is 8 days), it can be that the expected outcomes are indeed lower than the no. of deaths
I was wondering if there's cases where this could be due to the wrong delay distribution being used, and also suggest that the warning message is more informative and indicative of why this might be happening so that the user can better understand if it's due to a mistake or due to the characteristics of their data
tagging @adamkucharski for suggestions and @joshwlambert as part of the group where this topic was originally discussed