Watts-College / cpp-524-fall-2021

https://watts-college.github.io/cpp-524-fall-2021/
1 stars 0 forks source link

Final Project Ideas #6

Open lecy opened 3 years ago

lecy commented 3 years ago

A discussion board on research ideas for the final project.

lecy commented 3 years ago

Violence in Schools

I was wondering if this topic would be okay for the final project memo. I was thinking of hypothetically creating a program for adolescents in high school that could teach coping skills for depression, anger management and illegal substance use in order to lower the events of mass shootings in schools. I understand it would be hard to measure the outcome because we wouldn't know if the treatment worked, since we are unable to see an alternate universe that shows us a tragedy that we prevented due to the treatment program. I believe another way to measure the program outcome would be changes in personality and behaviors in the longitudinal study.


These sound interesting. I would search around for programs that look like what you describe - I am sure they are out there.

My suggestion would be to think of a generic program of this sort. What might it be called? For example, violence reduction in schools. Or are you focusing specifically on mass shootings (which would be much more rare).

You will likely have to make a decision if you focus on root causes (toxic masculinity, radicalization, mental health, etc.) or if you focus on conditions that enable violence in the presence of mental health or social dislocation (level of access to guns, use of security protocols in school, etc.).

"I understand it would be hard to measure the outcome because we wouldn't know if the treatment worked, since we are unable to see an alternate universe that shows us a tragedy that we prevented due to the treatment program. I believe another way to measure the program outcome would be changes in personality and behaviors in the longitudinal study."

I agree - choosing school shootings as the outcome could be challenging. So maybe finding a proxy that is highly-correlated with school shootings? Is there a common factor across a lot of the mass shootings cases (perpetrators develop certain extreme beliefs, experience bullying, share similar mental health challenges)? Could you use measures of pre-conditions as the outcome, as you suggest above? Or other measures of violence like fights or weapons in school?

Also consider the context. White suburban schools vs diverse urban schools. I think violence would look very different in these contexts.

You might look at the literature on the ban on assault rifles and domestic violence for some ideas on research design.

Tech use also interesting - similar comments, see if you can find an existing program (a school that bans devices) and find a study to see how they are operationalizing some of the ideas.

lecy commented 3 years ago

College Scholarships

I am a recruiter at the undergrad honors program at ASU downtown campus and I have a strong interest in first generation, students of color and international students’ population.

ASU currently offers scholarships based on student’s GPA or test scores, but I believe some students have more advantage to achieve high GPAs or test scores because they have access to resources.

It would be very interesting if I were able to evaluate the policy whether scholarships amount should be determined based on those factors.

Also, ASU is reducing some the amount of award from FA22. The decision has already been made but it would be cool if this decision could impact on the number of applications, enrollment and etc.

Another idea I have is that currently DACA students need to pay for the out of state tuition in AZ while at some states, they are qualified for in-state tuitions. I would like to examine if DACA students are qualified for in-state tuition, it would negatively or positively affect on tax payers.

I would appreciate if you could give me any advice.


These are great questions and make for very interesting research design.

I would start by thinking about a hypothetical field experiment where you get to assign some kids to the "treatment" and some kids to the control.

What would your treatment be in all of these examples? Change in scholarship criteria - or add things like evidence of overcoming hardship as a dimension of the scholarship Drop in tuition Increase in tuition (drop scholarship size) You can find lots of programs that have these characteristics. Georgia Hope Scholarship comes to mind - the state basically made tuition free if you are a Georgia resident and achieve a minimum high school GPA, then maintain your GPA in college.

https://www.issuelab.org/resources/4112/4112.pdf

It would be instructive if nothing else because it shows that the challenge in constructing an evaluation framework is difficult. Mainly you have the program that some kids would have gone to college in Georgia already and their parents can afford it. Thus the program subsidizes those that don't need it. Other kids would have left the state for college but will now stay, meaning admissions for state schools is more competitive (you still need to get accepted to attend). And lots of kids that need the scholarship lose it because they are unprepared for college and have to work while studying to support themselves.

In essence, you can approach the question from the student success vantage point. You need to divide the world into three groups - those that would have succeeded without the program (though they might not have debt now), those that won't succeed even with the program, and those that would not have attended / finished college but can now with the program.

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C3&q=georgia+hope+scholarship&oq=georgia+hope

You can also approach it from a public investment standpoint - given the cost of the program is it a good economic investment for the state? That's more complicated because even if kids would have gone to college anyways they might now stay in the state, which means more people with college degrees, which can benefit the economy.

So one aspect of the project is narrowing down the research question - pick one or two outcomes to focus on - most likely pick either a student achievement perspective or a public investment perspective. For the scholarship reduction question I might take the angle that you are losing some of the best students if they have better scholarship offers in another state. What are the implications for the Arizona economy of letting top students leave the state? Would they return afterwards? Does the economy lose out on having that group gone, or is there a surplus of highly-educated in the state? Would those students have left anyhow to attend more prestigious or more intimate private institutions? Maybe the reduction doesn't really change who accepts the scholarships in the end, just how much support that group gets?

There are other programs like scholarships for kids that are in their final years of college, close to a degree but out of money. Try to get them over the finish line, so to speak. Or college retention programs in general. Lots of important questions on these topics.

https://www.pushkin.fm/episode/carlos-doesnt-remember/

https://www.pushkin.fm/episode/my-little-hundred-million/

https://www.pushkin.fm/episode/food-fight/

So yes - all possible. I would suggest looking for programs that look like these examples and starting your literature review there. Lots of resources if you can define your question.

lecy commented 3 years ago

Child Sponsorship

I'm interested in doing my program evaluation paper on a non-profit organization called Compassion International based out of Colorado Springs, CO. A study was conducted in 2013 on this particular program, and I was wondering if I can use this study for my final project (see attached).

Wydick et al._Compassion_2013 (1).pdf


It looks like a solid study that employs a lot of tools we cover this semester.

I'm not sure what you mean by "use this study for your final project"? Your job is to design an evaluation for a specific program or program model, which does not mean selecting a design that has already be used and simply replicating it.

However, I think it's a great idea to start from what currently exists and see if you can improve upon it. The real challenge in this case, from what I can see, is that the study looks to be very rigorous and thoughtful, so you might have a hard time figuring out how to improve upon it.

Are you interested in child sponsorship programs specifically? Or this organization?

Some potential ideas: design an evaluation for a similar child sponsorship program that has a different program model (provides different levels or types of support). Borrow from this study but don't copy it completely.

There are lots of cash transfer programs that have characteristics of this sponsorship model but are different in other ways.

https://www.poverty-action.org/impact/cash-transfers-changing-debate-giving-cash-poor

https://www.povertyactionlab.org/evaluation/impact-financial-incentives-school-participation-mexico

Or similar interventions in international development like the Millennium Villages Project or Microfinance.

https://www.thelancet.com/journals/langlo/article/PIIS2214-109X(1830065-2/fulltext

https://hbr.org/2016/10/making-microfinance-more-effective

The paper is a great start. Just be sure to find a slight variation of a program model so that you are not merely reporting on what these authors did in the study and thinking through some of the evaluation design choices on your own.

lecy commented 3 years ago

Masks in Schools

I wanted to run my possible research design idea by you to see if I am remotely in the ball park. It is actually happening in real time in my life. I have five school age children. The masking at schools has been a very contentious issue. I am specifically interested in the elementary school age children. I know of only one study that did research on mask wearing in the school setting of 90.000 people which included both staff and students. The study was done by the CDC and found that wearing masks for staff was statistically significant in mitigating Covid but it was not statistically significant in students which I found interesting and wanted to explore more. I am curious if the treatment effect is different based on age. It would be very interesting to know if the natural waxing and waning of the virus is a confounding variable to mitigation attributed to mask wearing for young children. Do you see this as a possible workable topic?


Definitely, if we can't tackle the timely and important topics then why learn these skills?

I think the biggest question here is, how should we evaluate the effectiveness of masking in schools? Because there are lots of dimensions. Protection of school staff (teachers that have preexisting conditions or care for elderly)? Protection of kids - transmission rates versus adverse effects (lots of kids seem to fight it off without symptoms - before Delta at least)Protection of families of kids and stemming rates of spread in general - they might not get sick but spread the virus And your other question about ebbs and flows of the disease seems important. COVID classic seemed to not impact kids significantly, but rates of hospitalization increased significantly with recent strains. The right policy would depend a lot on the strain you are dealing with (or more generally the seriousness and spread of a strain in schools). 

The proper response may vary depending on the stage of a pandemic and how many people have natural or vaccine immunity. 

A core issue in this type of study is defining terms and figuring out measurement. Positive tests in non-random samples of students are extremely problematic because testing availability varies by site and most people don't get tested until they have symptoms. How would you measure COVID in your study? If you can't randomize kids or schools into treatment groups could you still require random testing to assess community rates like they did at ASU? Or sample from sewage to check for communal viral load? 

Alternatively you could look at things like rates of severe illness or hospitalization. 

Those measures might be more tangible, but also change over time with strains and rates of herd immunity that would be independent of masking. As a result your counterfactual would need to be comparison to other schools at similar points in time in similar geographies to account for disease progression. 

If you search around for studies on masking you will find that there is a lot out there, even if not all of it has been published yet. 

ebossert commented 3 years ago

I'm interested in evaluating the Bureau of Prison's Residential Drug Abuse Program (RDAP). It's an intensive 500-hour program for inmates with substance abuse issues and completion can reduce an inmate's sentence by 12 months. It has been going on for over 20 years and the new First Step Act (FSA) passed in 2018 could give even more funding to this program. I think it would be a good choice because there are also many program evaluations of prison drug treatment programs that I could use for my literature review. Since there is a waitlist for inmates to get into the program that may make for a decent counterfactual: those who would have joined voluntarily, but completed their sentence before being accepted I'm just not really sure what to use as the measure for program outcome, either recidivism or relapse, or both?

https://www.bop.gov/inmates/custody_and_care/substance_abuse_treatment.jsp

lecy commented 3 years ago

@ebossert it sounds like an interesting program to evaluate.

Your challenge will be figuring out how to measure impact. For example, what is the goal of the program? Reduce recidivism? Or alternatively reduce the likelihood of substance abuse? Someone could relapse without ending up back in jail. But it's very easy to observe rates of recidivism since that would be data collected by the state and it's a very binary outcome (you stay out or you come back).

Relapse is harder to observe - do you trust self-reports of drug use? Test them at regular intervals? The measure is also more nuanced - if a person has one drink and retains control over behavior is that relapse? Use of marijuana for pain or anxiety but not recreationally?

More importantly, with these sorts of treatments there is no pre-treatment measure (recidivism can only be measured after release - it's a meaningless measure in the pre-treatment period, for example). As a result if you are using the post-treatment only estimator you need to ensure your treatment and control groups are equivalent.

These studies typically use randomization for group construction. You will see a mental health example in one of the chapters on lab 3. Since you can't use randomization for your program design how can you ensure the groups are equivalent?

You definitely want to limit your study group to those that volunteer for the program if that will continue to be a program characteristic (mandatory treatment is much less effective, typically). Can you use some sort of program roll-out lag to create a meaningful comparison group? Select a group of facilities where the treatment starts immediately and another group where it starts 2 years from now. Allow residents to sign up at both not knowing which group they belong to. See if there is a difference in relapse or recidivism for all volunteers released at the same time across both types of facilities.

But you would definitely want to compare measured demographics of both groups to test the assumption that the groups are equivalent prior to treatment.

Does that make sense?

lecy commented 3 years ago

@ebossert if you want an example of a similar evaluation check out the Rikers Island evaluation of an anti-recidivism program:

https://nonprofitquarterly.org/what-we-learned-from-the-failure-of-the-rikers-island-social-impact-bond/

ebossert commented 3 years ago

@lecy Thank you for the feedback, it does make sense. I think of all the benefits or goals the program claims to have, recidivism is the one that can easily be measured. Although I'm personally more interested in the drug recovery side, I also don't think that measuring for relapse is feasible.

For the roll-out lag design, I'm wondering if the differences in demographics between treatment centers be big enough to make a difference in equivalency. If so, I think I could control for that by choosing program centers in the same geographical area. Otherwise I believe the roll-out method would be the best option.

I had thought of another comparison group method, but in practice it's probably too expensive and time consuming:

  1. Gather a large treatment sample size from X number of different program sites
  2. Analyze the demographics of that group (mean age, % race makeup, % type of drug dependencies, etc.)
  3. Gather a greater number of people on the waiting list to enter the program (to ensure voluntarism) for the comparison group
  4. Compile the individual demographics of that comparison group
  5. Add/remove individuals until the demographic statistics closely resemble that of the treatment group.
lecy commented 3 years ago

@ebossert You definitely want to consider differences across facilities because the demographics of those in different treatment centers would vary by community, as well as risk factors once they are discharged. For example:

A = facility in white, middle class suburb B = facility in diverse, urban neighborhood

If you use one as the treatment facility and one as the control facility you would end up with biased results because it's unlikely that relapse rates would be the same prior to the intervention so comparing rates after the intervention would be misleading (T2-C2 post-test only estimator).

If you have 500 facilities and you randomly assign half to the treatment group then you will statistically average out the differences between facilities (the treatment and control groups should have similar proportions of suburban facilities).

But that would be a huge and expensive study. You can be a little creative to overcome the weak estimator. For example, use one cohort of residents within a facility as the pre-treatment measure T1 and a later cohort as the post-treatment group T2. Similarly in the other facility that does not incorporate the new program use two cohorts as C1 and C2.

You don't have a pre-treatment measure for an individual (the concept of "readmission" or "recidivism" is not possible to measure before the person leaves the facility). But you can create a pre-treatment rate for a facility as long as the population it serves is pretty stable over time and the cohorts are large enough that statistically cohort 1 should be the same as cohort 2 (for example, same demographics, same sorts of substance abuse).

Now you have turned it into a reflexive problem - since you are comparing the facility to itself it is an apples-to-apples comparison.

With reflexive design the major threat to validity is secular trend. For example if drug markets are changing from marijuana to meth then long-term addiction rates might increase as well. Or decline in economic conditions and rise in unemployment might increase substance abuse. Changes to enforcement (legalization of marijuana) would significantly change relapse if you are measuring it through re-arrest.

Your comparison group would be used primarily to measure secular trend (C2-C1). And the diff-in-diff would remove trend from the measure of overall change T2-T1.


I had thought of another comparison group method, but in practice it's probably too expensive and time consuming:

1. Gather a large treatment sample size from X number of different program sites.  
2. Analyze the demographics of that group (mean age, % race makeup, % type of drug dependencies, etc.). 
3. Gather a greater number of people on the waiting list to enter the program (to ensure voluntarism) for the comparison group.  
4. Compile the individual demographics of that comparison group.  
5. Add/remove individuals until the demographic statistics closely resemble that of the treatment group. 

You are essentially describing a test for happy randomization after random assignment, similar to what we did on Lab 2.

Since you can't use randomization for the project the alternative approach would be something like matching.

For every individual in a facility in the treatment group match them to an individual in a comparison (control group) facility. You match on measured demographic characteristics of the individuals to try to identify "twins". For example, each 35-year old male with income between $60k-$80k in the treatment group would be matched with another 35-year old male with income between $60k-$80k in the comparison group. The individuals that don't have twins are dropped from the study so that the data used to estimate program impact represents an apples-to-apples comparison.

Matching is often used in post-hoc analysis when you have lots of historic data on individuals and only a small proportion of the population received the treatment so you have lots of candidates in the search for twins.

Make sense?

ebossert commented 3 years ago

Thank you @lecy

There are only about 70-80 facilities so the random averaging is more feasible than with 500 facilities. Since there are multiple facilities throughout the US (subdivided into 6 regions) perhaps I could choose a representative facility from each region and from there do the comparison. Or I could focus on just one region and one facility to do the evaluation just to see if it is effective within those demographics. The treatment group would be a cohort of residents within a facility that were admitted into the program at time =1 and a later cohort would be those who were on the waiting list and admitted a year later at time =2, kind of similar to the psychiatric hospital study from Lab 03.

I'll look into using the pre-treatment rate for a facility to make sure all the factors you mentioned are covered and stable over the time period of the evaluation since the reflexive design seems appropriate here. I think the time frame for this study would be short enough that the secular trends shouldn't have a huge impact. The program usually lasts 12 -18 months for each individual so I imagine the evaluation could take place over a 2-4 years (random estimate). However, I'm not sure if that's long enough to get accurate relapse or recidivism data from program participants.

I like the idea of matching to form the comparison group. There is a high demand for this program (only ~10% of applicants are admitted) so there would be a good chance I'd be able to find everyone a mate.

lecy commented 3 years ago

There is a high demand for this program (only ~10% of applicants are admitted) so there would be a good chance I'd be able to find everyone a mate.

If the program is over-subscribed can you convince them to use a lottery to determine the 10%? That would be robust.

If so great. If not then matching would be appropriate.

The program usually lasts 12 -18 months for each individual so I imagine the evaluation could take place over a 2-4 years (random estimate). However, I'm not sure if that's long enough to get accurate relapse or recidivism data from program participants.

If you do a quick search on relapse I bet you can find some statistics on peak periods of relapse. The longer someone is out without a relapse event the less likely they are to relapse. I would guess most cases occur within 6 months. For example, in the mental health chapter in Lab 3 most people that required readmission were back within a month. A time frame of 2-4 years post-release would results in high attrition or expensive costs to track down participants.

Since there are multiple facilities throughout the US (subdivided into 6 regions) perhaps I could choose a representative facility from each region and from there do the comparison. Or I could focus on just one region and one facility to do the evaluation just to see if it is effective within those demographics.

If you are trying to determine if a program theory / program model works then you are interested in average effects across facilities. You complicate things if you want to add subgroup analysis or look for heterogenous program effects across populations. It could be important and definitely worthwhile if you are trying to roll out a huge study across multiple states. If you are just trying to establish program effectiveness you could scale it back to a few facilities.

Focus on high internal validity first, then figure out generalizability. That way you know the basic program model works before you look at effectiveness in different contexts.