ecn310 / course-project-accidentsteam

course-project-accidentsteam created by GitHub Classroom
0 stars 0 forks source link

Data Fix Research #12

Closed annarupert closed 9 months ago

annarupert commented 10 months ago


Hi guys! I posted about our data issue on my daily log and @kbuzard recommends the following:

"I think the next thing to do is to find literature that actually uses this data and look at what they say about the data. If there are systematic problems with the data, other papers will probably talk about it and say what they do to address it. This is the best way to figure out what to do. There isn't a one-size-fits all approach--it depends on the context, and what work people have done to figure out the source of the problem."

If we could start looking for some literature that could help us approach this issue, that would be great!

Feel free to drop any links below, make sure you give a small synopsis of what you found and how it applies to our issue.

annarupert commented 10 months ago

Hi @ecn310/accidentsteam!

I looked through Professor Singletons paper entitled: Li, L., & Singleton, P. (2019). The Effect of Workplace Inspections on Worker Safety. ILR Review, 72(3), 718-748.

In this paper Prof Singleton explained that he had to remove some of the data that was broken and I have inserted a quote below to show the language he used:

"To derive the analysis sample of interest, we impose three additional restrictions. First, observation pairs are dropped if the first year occurs in 1996, as these data were not used to implement the SST plan.10 Second, the sample is restricted to states that participated in the SST plan, which includes all 29 states under federal jurisdiction with respect to OSHA and 6 states that operate state plans. Third, observation pairs are excluded if the case rate from the ODI is missing or exceeds 100, eliminating 1.9% of the sample.11 The remaining sample contains 154,808 observation pairs among 61,702 unique establishments, for an average of 2.5 observation pairs per establishment. A total of 25,460 establishments have only one observation pair."

I spoke with Dylan and created a new variable within the data calledinj_rate. This variable is the total # of injuries/total employees.

I found that this variable highlights the data was want to focus on and helps remove a lot of the cloudy elements of the data we have been running into.