Watts-College / cpp-524-fall-2021

https://watts-college.github.io/cpp-524-fall-2021/
1 stars 0 forks source link

Step 4 - Question about selecting estimators #20

Open dholford opened 2 years ago

dholford commented 2 years ago

Hello,

I'm trying to fully grasp how exactly "hypothetical" applies to the evaluations we are doing in our final paper. For my evaluation I think the best fit would be a difference in difference estimator, to see how the treatment would lead to different rates of anxiety/depression. In the real world it would be very challenging to find a reliable, common measure as the pre-test and post-test. Since I'm interesting in looking at school districts across the country there isn't necessarily a yearly, common test administered in all states that measures anxiety and depression (I could look at the rate of 504s for anxiety and depression, some districts administer a climate survey, some schools do participate in yearly health surveys that include stress, anxiety - and even some of this data wouldn't necessarily be widely available). In designing this evaluation am I creating a design assuming I could've been there at the beginning to administer some sort of pre-test that would be common across both treatment and control, or am I operating in the real world where I don't have that luxury and need to find a way to make it work anyway? I think there is a higher chance for measurement error if I'm reflexively identifying a pre-test but that is more accurate to what would actually have to be done in this scenario.

Thanks, Dylan

danafuller commented 2 years ago

@dholford I do know of a questionnaire that some pediatricians use to screen for depression. It is called the PHQ-9 Modified for teens. This could be a good measure for you. They use it on children that are as young as 11 for sure, but I am not sure how early they introduce this beyond that age.

danafuller commented 2 years ago

@lecy I would love some guidance on my counterfactual. I am looking at the effects of mitigation strategies for Covid in a school district at the elementary school level. I am specifically honing in on District initiated strategies that are passive for the student: air flow turnover in the classroom, sanitation, and adding an additional measure of nutrient dense food served as part of their free breakfast and lunch program in conjunction with a healthy best practices education segment daily as part of the treatment. An additional treatment of unmasking the children, excluding the staff, will be added to determine what effect and interaction all of these things have on Covid mitigation. I am having a hard time clearly defining my counterfactual I think because there are many layers to my research design. I am thinking this.....Covid rates at the elementary school level remain stable with current practices of masking, airflow transfer, and sanitation with instances of students experiencing anxiety, emotional distress, and barriers to learning specifically barriers to communication. Is that a specific and strong enough counterfactual? My treatment will be a daily education segment on healthy best practices (personal hygiene, the effect of food, exercise and sleep on the immune system) and unmasking the elementary school children only. I will add the additional treatment of nutrient dense food served over the course of the school year with inflammation markers as a pre and post test as well as attendance records, nurse notes of visits of students, and self reported sickness. I will also have a pre and post test related to emotional/mental health. I will keep the current District initiated practices of sanitation and increased airflow transfer in the classrooms.

lecy commented 2 years ago

@dholford

For my evaluation I think the best fit would be a difference in difference estimator, to see how the treatment would lead to different rates of anxiety/depression. Since I'm interesting in looking at school districts across the country there isn't necessarily a yearly, common test administered in all states that measures anxiety and depression. In designing this evaluation am I creating a design assuming I could've been there at the beginning to administer some sort of pre-test that would be common across both treatment and control, or am I operating in the real world where I don't have that luxury and need to find a way to make it work anyway?

The one restriction is that you can't use a RCT to construct your counterfactual (it simplifies the process too much - the real skill in designing an evaluation is being creative about the counterfactual).

However, you are free to design a hypothetical evaluation that would require you to collect data with your study groups before and after. Specifically this works if you are staging an intervention. It's harder if you wanted to do a retrospective study - for example looking back at 2020 to see how school closings impacted kids. Data collection will be expensive, but you don't need to stay within budget for the study if that is the best way to answer your research question.

In the real world it would be very challenging to find a reliable, common measure as the pre-test and post-test.

There should be some decent instruments for depression, some probably pretty quick to administer and some tailored to kids.

lecy commented 2 years ago

@danafuller

My treatment will be a daily education segment on healthy best practices (personal hygiene, the effect of food, exercise and sleep on the immune system) and unmasking the elementary school children only. I will add the additional treatment of nutrient dense food served over the course of the school year with inflammation markers as a pre and post test as well as attendance records, nurse notes of visits of students, and self reported sickness. I will also have a pre and post test related to emotional/mental health. I will keep the current District initiated practices of sanitation and increased airflow transfer in the classrooms.

For the purpose of this study I would either bundle everything into a single treatment or else isolate one intervention and use other features as controls.

You are basically describing three treatments here, which would require a factorial design. You would need separate groups for the following to cleanly identify effects of each:

T1
T2
T3
C
T1+T2
T1+T3
T2+T3
T1+T2+T3

This is why I tell students to keep it simple. You are describing a research agenda or a dissertation here, not a single study. If you want high internal validity you need to isolate one factor at a time. It's frustrating because you want actionable knowledge and the real world is complex! But if you have a study that is inconclusive it doesn't help advance knowledge on any of these items.

dholford commented 2 years ago

@dholford I do know of a questionnaire that some pediatricians use to screen for depression. It is called the PHQ-9 Modified for teens. This could be a good measure for you. They use it on children that are as young as 11 for sure, but I am not sure how early they introduce this beyond that age.

Thanks, Dana! I'll check that out!

dholford commented 2 years ago

@dholford

For my evaluation I think the best fit would be a difference in difference estimator, to see how the treatment would lead to different rates of anxiety/depression. Since I'm interesting in looking at school districts across the country there isn't necessarily a yearly, common test administered in all states that measures anxiety and depression. In designing this evaluation am I creating a design assuming I could've been there at the beginning to administer some sort of pre-test that would be common across both treatment and control, or am I operating in the real world where I don't have that luxury and need to find a way to make it work anyway?

The one restriction is that you can't use a RCT to construct your counterfactual (it simplifies the process too much - the real skill in designing an evaluation is being creative about the counterfactual).

However, you are free to design a hypothetical evaluation that would require you to collect data with your study groups before and after. Specifically this works if you are staging an intervention. It's harder if you wanted to do a retrospective study - for example looking back at 2020 to see how school closings impacted kids. Data collection will be expensive, but you don't need to stay within budget for the study if that is the best way to answer your research question.

In the real world it would be very challenging to find a reliable, common measure as the pre-test and post-test.

There should be some decent instruments for depression, some probably pretty quick to administer and some tailored to kids.

Thanks @lecy. I am having an internal battle with that. I'm interested in thinking through how to determine whether or not "Do No Harm" grading actually served to reduce stress for students during school closures. I could create a hypothetical evaluation that's future facing, but I'm also curious / interested in thinking through what it would look like to evaluate that in real life which would require some sleuthing in retrospect, and may not even be possible with any amount of certainty/statistical significance.

lecy commented 2 years ago

I'm interested in thinking through how to determine whether or not "Do No Harm" grading actually served to reduce stress for students during school closures.

The issue with a retrospective study is that you might have measures of student performance like grades, but how would you retrospectively measure stress?

Do any schools systematically collect that sort of information? If not stress, some assessment of mental health?

Even if you could find a proxy like suicide rates you have a problem of co-determinancy. Other stressors associated with the pandemic likely raise suicide rates much more than the grading system at school. That will be an issue for any measures of stress.

If you could find a measure of stress, though, and compare a school that used do no harm grading with one that didn't you might have a chance.

But in general this sort of retrospective study that necessitates a specialized outcome measure of a latent construct like stress would be challenging.

Are there alternatives? What about student performance on standardized tests at schools that used do no harm grading versus those that didn't? Student performance in the first year of college?

Otherwise I would advise to be prospective with that kind of study. Find a school that is going to change grading systems, see how it impact students in the cohort prior, cohort after the change?