UI-Research / baby-bonds

Data analysis for the Baby Bonds project and estimation of models for DYNASIM.
0 stars 1 forks source link

Construct long-term trends in college education #4

Open dc0sic opened 10 months ago

dc0sic commented 10 months ago

Long-term trends in college education will be necessary for calibrating the college education model. These long-term trends should be disaggregated by sex, race/ethnicity, and age. One potential source of data is CPS ASEC.

judah-axelrod commented 9 months ago

Just pushed a Quarto notebook that uses ipumsr to get the CPS ASEC. Data from 1971-2023 is <9 million observations (or 6.2 million after filtering out missing IDs and education data), and it ran pretty quickly locally. A few notes here:

I generated a few counts by year at the bottom - happy to display some bar or scatterplots if there are specific cuts you're interested in seeing!

dc0sic commented 9 months ago

Thank you, Judah! I will review tomorrow, but just to quickly answer your questions:

dc0sic commented 9 months ago

Everything looks good. I have only one additional comment regarding race categories. Because the main purpose for the estimates in education trends is the calibration of our model, it is important to have the same race categories as we have there: Asian (non-Hispanic), Black (non-Hispanic), Hispanic, and White (non-Hispanic).

judah-axelrod commented 9 months ago

Just re-committed. Flagging that nlsy_lib.R codes AAPI as Other, but I left it separate in the diagnostic plots. I see that AAPI only begins in the late 1980s, so I can collapse it if necessary.

Also added time series plots where you can see proportion of folks in each age, race, or sex group that has achieved each education milestones.

Lastly, I want to flag that there is a huge jump in total sample from 2000 (75K) to 2001 (123K); it doesn't seem to affect the proportions very much but might be worth keeping in mind. More info [here])(https://www.census.gov/library/working-papers/2007/demo/POP-twps0080.html).

dc0sic commented 8 months ago

Good point about the AAPI. I will make this change in nlsy_lib.R.

I have a couple of comments on get_cps_asec.qmd. @judah-axelrod I know you have limited amount of time, so don't feel like you have to complete all of this.

judah-axelrod commented 8 months ago

Sounds good! Two follow-up questions:

dc0sic commented 8 months ago

No need for replicate weights, individual weights are fine.

Regarding slopes, yes, a simple linear regression on year, or just add geom_smooth(method=lm, se=FALSE) (perhaps using dashed lines).

judah-axelrod commented 8 months ago

Updated to include individual weights and new plots of bachelor's by race, faceted by sex and age to get interactions + a smoothing curve. I added another view without the White race group so that we can zoom in on the y axis for the other 3 groups. Also added a regression if seeing specific coefficients is helpful!

Also, I set up the category as "Bachelor's and above" - not sure if a distinction between bachelor's and graduate degrees is important for validation purposes but can tweak categories as needed.

dc0sic commented 8 months ago

Thank you! Sorry if I wasn't clear, but when I said to focus on BS degree I meant to plot only the share of people with BS degree, not to restrict the sample to people with BS degree. We want to see how, within each demographic group, the share of people who graduate from college changes over time.

Regarding trends, at this point we're interested in whether they can be approximated by linear trends and, if so, what their slopes are. So let's do geom_smooth(method='lm', se=FALSE).

Regression coefficients will be useful, but we want a simple regression freq ~ YEAR estimated for each demographic group.

judah-axelrod commented 8 months ago

Sorry about that, that makes sense! Updated to include all of this.

dc0sic commented 8 months ago

That's great, thank you!