[ ] “While there is no clear pattern in the distributions of within-group approximate Cram\’er distances among two week ahead forecasts by model types, Cram\’er distances four week ahead forecasts generated from mechanistic models had higher medians and variance for both horizons. This indicates that four week ahead forecasts from mechanistic models were more dissimilar to one another compared to those from statistical models.” My read of Fig 4(a) is that this is generally true, but not universally true. Additionally, I think that the boxplots are summarizing the distribution of points, where each point is a mean distance for one model pair? So, what appears to be a relatively small general tendency in the plot could be driven, e.g. by 1-2 models that are different from the others and are responsible for many points. And relatedly, if I understand the set up of the test right, it seems like the claim that “four week ahead forecasts from mechanistic models were more dissimilar to one another compared to those from statistical models” runs directly counter to the failure to reject the null of no difference in mean dissimilarity by model type from the hypothesis test?
[ ] “The greater degree of leading behavior of median approximate Cramér distances among four week ahead forecast pairs, as measured by the higher proportion of highest cross-correlations occurring negative lags, could be due to a higher uncertainty captured by wider prediction intervals, indicating more disagreement among forecasts for farther horizons.” I don’t think I see meaningful evidence for “the higher proportion of highest cross-correlations occurring negative lags” in the right hand side of Fig 7(b). Relatedly, Fig 7 (b) caption “The distribution of lags at which maximum absolute cross-correlations values occurred seem to show that median Cramér distances lead an increase in incident deaths at four week ahead horizon.” There’s an implication that we think something is different for 2 and 4 week horizons. Qualitatively, my sense is that the histograms for 2 and 4 weeks look pretty similar, and I think we’re interpreting a difference in ratios that’s something like 18/36 vs 13/36.
[ ] Fig 6 (b) caption “Cross-correlation plots of the medians of approximate CDs and incident deaths in Illinois show different leading/lagging behaviors of medians differ in two analysis periods.” But the left and right columns look qualitatively very similar to me?
[ ] It would be nice to make an editing pass for grammar.