Closed MosesStewart closed 7 months ago
The seem focused on how we measure covid rates. Remember when people were measuring covid rates in the sewer system. Would be interested in how much of this data there is. will look after 1:30 when my class ends.
The seem focused on how we measure covid rates. Remember when people were measuring covid rates in the sewer system. Would be interested in how much of this data there is.
They gave examples of dependent variables such as infections per capita and positive testing rate that we can already construct with the data they provided.
If you want to take the initiative to get additional data then that's fine, but I would remember that this is only a qualification round. I think they're just looking to make sure we can show a causal relationship with comprehensible reasoning in around 4-6 hours of work
I would recommend looking at the paper
they cited. They use the same data we are given, and answered several of my questions:
CHANNELS
: They found a correlation between occupations and COVID rates, where people with jobs where they had to interact more had higher rates. After controlling for occupations, they found that the length of commute was not significant. They also found significant correlation for household size.MODEL
: I think, similar to the paper, it would be best to run a few regressions, including more variables each time.
- I'm honestly not familiar with proving causal relationships with non-indicator random variables, so any ideas on how to structure the model would be appreciated. I can also try to look at some stuff later.
In the paper they cited, they just used p
-values on the coefficient being greater than zero. (Edit) I'm leaning towards simply using a z-test for all of our p-values if no one objects.
- They purposely excluded data for 2 days. I'm not sure at what the best way to go about handling that is.
- We have COVID data that stretches over 2 months, but for the
CHANNELS
I suggested, there is only fixed data from one time point. If we want to regress over zip codes (locations), how do we implement the COVID data stretching over time?
In the paper they just averaged over weeks. We can do the same and I think not worry about it.
- They ask why spatial autocorrelation correction is more appropriate than a simple heteroskedasticity correction. I am do not know. I would have to do some reading to answer this.
I still don't know this, or why a regression would be well-suited to this situation/ what weaknesses it has. I have class again, but I plan to start writing the code/text around 7:00 pm, so would be nice if we can finalize the direction we want to go for CHANNEL
and MODEL
- They ask why spatial autocorrelation correction is more appropriate than a simple heteroskedasticity correction.
This was a lot easier than I thought ~ a heteroskedascity correction assumes that standard errors are independent across observations, which we wouldn't expect if zip codes are correlated with each other.
I will start working on an implementation of the code they provided in Matlab tonight. If we want to add data re https://github.com/MosesStewart/uofc_prelim/issues/1#issuecomment-1938940774 then we can do that later. After the implementation, we will still need to discuss independent variables and start writing.
I have class until 2:00 pm but here are my initial thoughts:
CHANNELS
The prompt asks us to identify what channels we think COVID spread through. After looking at the dataset some possible "channels" could be:METHOD
The prompt seems to be pushing us towards a regression, since they link code from "Hsiang" using spatial autocorrelated standard errors. We do have latitude data, which I think we need for this. I have no idea how spatial autocorrelation works, but since they believe this should only take us 4-6 hours, I'm guessing they expect us to use this. I would prefer implementing in Matlab over StataMODEL
They are asking for causal analysis. The two variables I suggested through channels are both continuous random variables. I'm honestly not familiar with proving causal relationships with non-indicator random variables, so any ideas on how to structure the model would be appreciated. I can also try to look at some stuff later. If we go with regression, all we really need to decide are which variables we want to include.Other questions I haven't been able to figure out:
CHANNELS
I suggested, there is only fixed data from one time point. If we want to regress over zip codes (locations), how do we implement the COVID data stretching over time? May be helpful to look at other papers. Should we focus on early weeks of the pandemic like they say in the abstract? Should we do a weighted average of cases, with heavier weights during early weeks?