Exploring surveillance data biases when estimating the reproduction number: with insights into varying subpopulation transmission in the first Covid-19 outbreak in England
[x] correct Rt typesetting
to be corrected in Word when revisions complete
[x] capitalise “Table” and “Figure” references
[x] changing “Rts” throughout, e.g. changing to “Rt values” or “Rt estimates”
edited to Rt estimates, estimates of Rt, Rt estimated from...
[x] page 7, line 46, ref should be to Fig SI2A, rather than SI1A
Methods
[x] test out UK specific delays if time permits
this was not completed in time
[x] clarify - delays for hospital admissions and test positives are treated as having the same delay from onset (and therefore the same lag from infection to observation)
included, page 4 para 1
[x] Page 5 – extra explanation as to how the uncertain distributions were then sampled. For example, how are the uncertainty in both mean and standard deviation captured. When estimating these delays, mean and standard deviation are coupled, so is the uncertainty generated from e.g. a posterior sample of mean and standard deviation pairs, or are means and standard deviations sampled assuming the uncertainty is independent?
Edited to include (google doc p4):
When sampling uncertain distributions of time intervals, the mean and standard deviation were sampled independently. However the impact of this assumption is limited as while samples were used in the fitting process, the convolution was not explicitly defined.
[x] P5, L37: Is this prior informed by the data not equivalent to “using the data twice”?
Edited to include: (google doc p3)
This method therefore uses the same data both as a prior and in fitting. This assumes that observed growth is equivalent to unobserved growth, and particularly impacts the first few observations. We explored the alternative of using an independent time point prior, but this suffered problems with model identification.
[x] P5, L38: How are imputations done?
clarified that imputations are done in the model, with the initial number of infections as described weighted by generation time and Rt
[x] what is an uncertain generation time p.m.f
p.m.f. is uncertain, or that the p.m.f. is known, but models a stochastic outcome?
Edited to include
The generation time was assumed to be known and fixed over time, with an uncertain mean and standard deviation that was sampled on each model run in order to preserve uncertainty. (google doc p3)
Add note to reviewer: this known, fixed generation time assumption is a pragmatic choice to preserve uncertainty - e.g. where sampling at each time step would narrow CIs with little justification for a time-varying generation time - but being explored
[x] mathematical details of the modelling that was used, rather than simply references to other papers – e.g. the full Bayesian model specification (with priors)
added more model spec including math notation, and included priors (proof: p5 para 3; google doc p2-3)
Discussion
[x] UK-specific vs global delays - would this improve/worsen the discrepancy between admissions and deaths
see para 5 of page 7. Added some text around decisions to use global vs UK specific delays (ie the trade-off between UK specific delay vs public data). As the implication on the discrepancy, added the following:
The difference in source of delay distributions should not have substantially altered our conclusions about discrepancies between central estimates of Rt from either test-positives or admissions, compared to Rt from deaths. However, using the public linelist for the delay to test or admission may have introduced additional uncertainty around the respective Rt estimates, compared to greater accuracy (reduced uncertainty) in estimates of Rt from deaths based on a UK-specific delay distribution.
[x] Implications of delays being the same for cases / hospitalisations - for example, with the higher testing rate rolled out over summer and wider community testing, the delay from symptom onset to testing might have decreased, whereas the delay from onset to hospital admission will not have experienced the same change.
added: google doc p7. Key point:
This would have a differential impact on the accuracy of Rt estimates over time in either direction, which could explain some of the oscillating variability in Rt estimates from test-positive case data compared to hospital admissions. We had no data over time on delays from symptom onset to reporting in each data source with which to test this hypothesis. However, we have mitigated some of the likely impact by using independent sampling over an uncertain delay distribution for each set of estimates.
[x] Page 6 line 55 – “However, as much as spatial variation, the data sources used to estimate Rt influenced the earliest date of epidemic decline.” – edit for clarity
Edited to:
However, the data sources used to estimate Rt was as important as any regional variation in estimating the earliest date of epidemic decline
[x] Page 9 line 21 - local nosocomial outbreaks could have also contributed to this discrepancy
edited, now includes:
Alternatively, this may reflect an early sampling bias which disproportionately represented healthcare workers in testing, compared to admissions or deaths data. In the early spring period, testing was largely limited to hospital settings. Rt estimates from test-positives would then represent a separate route of transmission in healthcare settings, compared to that among the general population. If healthcare workers were then less susceptible to severe disease, an early peak in Rt from test-positive cases may be due to a wave of nosocomial infections [#Evans-2020] which would not have been represented in Rt from hospital admissions or deaths. Similarly, if transmission moved through the general population later than transmission in healthcare settings, then the timing of peaks in Rt from each data source would not have matched.
[x] discuss whether pooling estimates might help provide a more robust estimate of Rt, or whether it’s better to present multiple estimates to policy makers
Edited discussion (draft proof: p10 para 2) to clarify we recommend against pooling estimates, on the basis of both unclear weightings, and information loss (google doc p9, para 1)
Please could you check the google doc where marked up (i.e. title; some edits to methods; additional text added in discussion)?
The above (reviewer comments + responses) is my basis for the review response letter. This is drafted at the end of the google doc. Please could you check this meets expectations?
Following review, some edits to make in this google doc: 201203 - Rt comparison - revision
General
Methods
[x] test out UK specific delays if time permits
[x] clarify - delays for hospital admissions and test positives are treated as having the same delay from onset (and therefore the same lag from infection to observation)
[x] Page 5 – extra explanation as to how the uncertain distributions were then sampled. For example, how are the uncertainty in both mean and standard deviation captured. When estimating these delays, mean and standard deviation are coupled, so is the uncertainty generated from e.g. a posterior sample of mean and standard deviation pairs, or are means and standard deviations sampled assuming the uncertainty is independent?
[x] P5, L37: Is this prior informed by the data not equivalent to “using the data twice”?
[x] P5, L38: How are imputations done?
[x] what is an uncertain generation time p.m.f
[x] mathematical details of the modelling that was used, rather than simply references to other papers – e.g. the full Bayesian model specification (with priors)
Discussion
[x] UK-specific vs global delays - would this improve/worsen the discrepancy between admissions and deaths
[x] Implications of delays being the same for cases / hospitalisations - for example, with the higher testing rate rolled out over summer and wider community testing, the delay from symptom onset to testing might have decreased, whereas the delay from onset to hospital admission will not have experienced the same change.
[x] Page 6 line 55 – “However, as much as spatial variation, the data sources used to estimate Rt influenced the earliest date of epidemic decline.” – edit for clarity
[x] Page 9 line 21 - local nosocomial outbreaks could have also contributed to this discrepancy
[x] discuss whether pooling estimates might help provide a more robust estimate of Rt, or whether it’s better to present multiple estimates to policy makers