MosesStewart / uofc_prelim

0 stars 0 forks source link

Decide what method we want to use #1

Closed MosesStewart closed 7 months ago

MosesStewart commented 7 months ago

I have class until 2:00 pm but here are my initial thoughts:

Other questions I haven't been able to figure out:

zoe-shleifer commented 7 months ago

The seem focused on how we measure covid rates. Remember when people were measuring covid rates in the sewer system. Would be interested in how much of this data there is. will look after 1:30 when my class ends.

MosesStewart commented 7 months ago

The seem focused on how we measure covid rates. Remember when people were measuring covid rates in the sewer system. Would be interested in how much of this data there is.

They gave examples of dependent variables such as infections per capita and positive testing rate that we can already construct with the data they provided.

If you want to take the initiative to get additional data then that's fine, but I would remember that this is only a qualification round. I think they're just looking to make sure we can show a causal relationship with comprehensible reasoning in around 4-6 hours of work

MosesStewart commented 7 months ago

I would recommend looking at the paper they cited. They use the same data we are given, and answered several of my questions:

  • I'm honestly not familiar with proving causal relationships with non-indicator random variables, so any ideas on how to structure the model would be appreciated. I can also try to look at some stuff later.

In the paper they cited, they just used p-values on the coefficient being greater than zero. (Edit) I'm leaning towards simply using a z-test for all of our p-values if no one objects.

  • They purposely excluded data for 2 days. I'm not sure at what the best way to go about handling that is.
  • We have COVID data that stretches over 2 months, but for the CHANNELS I suggested, there is only fixed data from one time point. If we want to regress over zip codes (locations), how do we implement the COVID data stretching over time?

In the paper they just averaged over weeks. We can do the same and I think not worry about it.

  • They ask why spatial autocorrelation correction is more appropriate than a simple heteroskedasticity correction. I am do not know. I would have to do some reading to answer this.

I still don't know this, or why a regression would be well-suited to this situation/ what weaknesses it has. I have class again, but I plan to start writing the code/text around 7:00 pm, so would be nice if we can finalize the direction we want to go for CHANNEL and MODEL

MosesStewart commented 7 months ago
  • They ask why spatial autocorrelation correction is more appropriate than a simple heteroskedasticity correction.

This was a lot easier than I thought ~ a heteroskedascity correction assumes that standard errors are independent across observations, which we wouldn't expect if zip codes are correlated with each other.

MosesStewart commented 7 months ago

I will start working on an implementation of the code they provided in Matlab tonight. If we want to add data re https://github.com/MosesStewart/uofc_prelim/issues/1#issuecomment-1938940774 then we can do that later. After the implementation, we will still need to discuss independent variables and start writing.