Closed Peter-Kedron closed 2 years ago
Sampling Plan
Attribute Variable Transformations
Analysis
More later still...
Planned Differences and Evaluating Results
Unplanned Deviations
Reproduction Result
Discussion
Emily, you can address these issues with Reproduction Results:
Abstract
* Does Jay use 'challenges due to COVID-19'? What exactly does that mean and how is it measured. It seems the analysis looks at spatial co-location of cases and PwDs. I don't think that captures challenges at all. So unless Jay directly used that language, I think we should avoid it.
Good catch. Jay frames the paper in "challenges", but then focuses the intent of the analysis & contribution more narrowly. Removed the challenges framing.
* I would shorten this abstract to ~250 words and just hit on what, why, how, outcomes. I think what it written here could be an intro section. Perhaps we worry about that change as this becomes a manuscript. In that transition, I would add to the intro the central motivations of any reproduction - to evaluate the original work through reanalysis, clarify decision making, and open it to further critical review.
You're right-- this is serving more like an introduction than an abstract. We shall refine it for the manuscript & leave the report as-is.
Additional Hypothesis Block
* We don't have to do this, but in the COVID reproductions we included a boxed in block that stated the hypothesis tested and result of each test from the original paper. We did this because 1) All of the papers were not great about clearly stating the hypotheses they were testing. In most cases the authors alluded to wanting to study the relationship between X and COVID, but never linked that specifically to their statistical model or clearly stated a falsifiable hypothesis. 2) Because of (1) it was usually a lot of work to connect their claims to their tests to they hypotheses. So we did the work for the reader of making it very plain what was tested, how, and what the results were. I like this because it went a long way to clarifying what the authors actually did and if their statistics lined up with their plan.
Good observation. In fact, the original paper has the same issue of not clearly specifying the hypotheses. Additionally, I am having trouble finding good examples of clearly specified GEE hypotheses, especially in the context of GEE producing estimates, not inferences. Attempting to craft something anyway...
Study Design
* Do both H1 and H2 rely on exactly 5 models?
H1 is many bivariate correlation tests while H2 is 5 models. Added clarification to text.
* If we don't add the Ho block above, we should state in here somewhere what the original Hos were and the associated tests and outcomes. OR below the lead paragraph of the original study design section
Adding in an H0 block.
Original Study Design
* I know this is a quote, "whether COVID-19 incidence is significantly greater in counties containing higher percentages of socio-demographically disadvantaged [people with disabilities], based on their race, ethnicity, poverty status, age, and biological sex", but it is very confusing. I don't understand what he is saying. Is he conceptualizing each of those things as disabilities, which seems odd? Or is there some cross-cut in the data that segments the people w/disabilities by race... I can't imagine that is the case given the data. Perhaps we can add a follow on sentence that clarifies this some.
It is confusing, and I was always uncertain if the denominator was the total population or the population with disabilities. I hope I have sufficiently clarified with an example.
* 49 contiguous states? Did we finally get Quebec? Or is it 48 + DC? Wishful thinking about the balance of political power in the US Senate. Although I admire your verve to match taxation with representation.
Fixed, and Emily learned some North American geopolitics in the process...
* Give the version of SatScan and Arc.
Added SatScan version-- only know about Arc through verbal conversations with Jay and don't know the version. I'm ok leaving the ambiguity in our report since it was also unspecified in the paper. He also used SPSS (unknown version), which I need to include.
Sampling Plan
* I would add a sentence saying how the data from Jay was used. It is implied it was viewed as a check. I would just be explicit. Did you look at the GEE input files before you made you first pass at the reproduction (ie was the reproduction blind to his data files first?)?
Done-- added an enumerated list of purposes.
* The verb tense is mixed up the first sub section here. I think it is just a hold over from the pre-analysis plan. Some is future tense. Some is past tense. I think the report should all be past tense.
Done: switched to past tense and removed some pre-analysis specific sections.
Attribute Variable Transformations
* What counts as 'outside the cluster" in the relative risk calculation? Is it everywhere that is not that cluster or another cluster? is there some spatial bound? Do these clusters cross state lines? I assume so.
Attempted to explain better, referred to relevant maps, and added an example description for New England.
* The relative risk becomes a categorical variable. What categories where set to which parts of the scale? Was one assigned as a zero? Which?
Created a classification table to clarify this.
* What was the spatial information on the SatScan (e.g., projection etc)? It probably wouldn't change much, but would be good to state. Oh, I see it at the top of the analysis. Nevermind.
Good question, though. We added this, and clarified that it's a spherical great circle calculation agnostic to the actual GCS. Also discovered that spatialepi has a lazy way of calculating distance with a lat/long grid.
Analysis
* I am not sure about that 1st and 2nd order effect statement. It seems like there is some work done on this. The cluster and its inclusion in the GEE is one way to do it, as is the state control variable. These would work like fixed effects that sponge up some of the 1st order variance. I'll need to think over how to talk about this issue.
True, I think the State ID attempts to account for first order effects and the COVID risk class attempts to account for second order effects.
* On the temporal support. I might just re-state that this is a cross-sectional study and the study period in X, which means the temporal extent is also X.
OK!
Planned Differences and Evaluating Results
* Flip the verbs to past tense as this is now the report telling what was done. The language is speculative whereas it should not be a definitive report of what happened and why. We might just say here that we followed the pre-anaylsis plan and give the link and perhaps a short overarching summary. If all this is repetitious then for the report the real interesting part is the unplanned deviations.
It should be largely consistent with the pre-analysis plan, right? If the document should stand on its own, then transitioning to past tense makes sense. Done.
Unplanned Deviations
* So this is the very interesting section. I am not sure I have totally wrapped my head around it just reading the description. However, there is certainly enough here to merit the reproduction(reanalysis) and illustrate its value. The reanalysis is the key as you've tried to move to a operationalization that matches what seemed to be Jay's intent. As we move to publication, I think this is the part to build upon.
Thanks, and agreed!
Reproduction Result
* I might label this cumulative cases and give the date range in the map. Ideally maps can be read as stand alone entities.
Added clarification of the variable and date range, which I think implies that it is cumulative across the range of dates.
* It may be good to provide the original values for these in Table 1 and a final column that documents agreement/disagreement in perhaps direction and rough magnitude.
Agreed, and done.
* I'd encourage a little more explanation around Figs 4-6. This is the really interesting part of the re-analysis. How was Jay operationalizing in Fig 4? Was he simply assigning that center county relative risk to all the counties in the circles in Fig 5?
YEs, just the center county. Hopefully clarified this.
* I am confused about the GEE results. Are those the results of the GEE that matched Jay's set-up? Or are these the results of your improved relative risk assignment? Or did you never use the relative risk assignment in the model and just did it to demonstrate a better path forward? I the last is true, perhaps that is the avenue to take to in any further re-analysis, perhaps in a better modelling framework.
Clarified all of the GEE results and added them for each of the
Discussion
* Can you identify what portion of the mismatch was due to the computational environment? Probably not, but you say this a few times and it is always nice to quantify these things is possible.
Added new tables to results for this.
* We should talk more about the deeper question of the clustering approach mixed with regression and the points about using this methods with county polygons. I think we can chat about that and build out for next steps to a publication. Assuming all are interested.
I looked over the report again. This report is looking pretty good. Is you plan to move to the paper next?
Abstract
Additional Hypothesis Block
Study Design
Original Study Design
Ok more later