Open dc0sic opened 10 months ago
Just pushed a Quarto notebook that uses ipumsr
to get the CPS ASEC. Data from 1971-2023 is <9 million observations (or 6.2 million after filtering out missing IDs and education data), and it ran pretty quickly locally. A few notes here:
I generated a few counts by year at the bottom - happy to display some bar or scatterplots if there are specific cuts you're interested in seeing!
Thank you, Judah! I will review tomorrow, but just to quickly answer your questions:
Everything looks good. I have only one additional comment regarding race categories. Because the main purpose for the estimates in education trends is the calibration of our model, it is important to have the same race categories as we have there: Asian (non-Hispanic), Black (non-Hispanic), Hispanic, and White (non-Hispanic).
Just re-committed. Flagging that nlsy_lib.R
codes AAPI as Other, but I left it separate in the diagnostic plots. I see that AAPI only begins in the late 1980s, so I can collapse it if necessary.
Also added time series plots where you can see proportion of folks in each age, race, or sex group that has achieved each education milestones.
Lastly, I want to flag that there is a huge jump in total sample from 2000 (75K) to 2001 (123K); it doesn't seem to affect the proportions very much but might be worth keeping in mind. More info [here])(https://www.census.gov/library/working-papers/2007/demo/POP-twps0080.html).
Good point about the AAPI. I will make this change in nlsy_lib.R.
I have a couple of comments on get_cps_asec.qmd. @judah-axelrod I know you have limited amount of time, so don't feel like you have to complete all of this.
Sounds good! Two follow-up questions:
ASECWT
)?No need for replicate weights, individual weights are fine.
Regarding slopes, yes, a simple linear regression on year, or just add geom_smooth(method=lm, se=FALSE)
(perhaps using dashed lines).
Updated to include individual weights and new plots of bachelor's by race, faceted by sex and age to get interactions + a smoothing curve. I added another view without the White race group so that we can zoom in on the y axis for the other 3 groups. Also added a regression if seeing specific coefficients is helpful!
Also, I set up the category as "Bachelor's and above" - not sure if a distinction between bachelor's and graduate degrees is important for validation purposes but can tweak categories as needed.
Thank you! Sorry if I wasn't clear, but when I said to focus on BS degree I meant to plot only the share of people with BS degree, not to restrict the sample to people with BS degree. We want to see how, within each demographic group, the share of people who graduate from college changes over time.
Regarding trends, at this point we're interested in whether they can be approximated by linear trends and, if so, what their slopes are. So let's do geom_smooth(method='lm', se=FALSE)
.
Regression coefficients will be useful, but we want a simple regression freq ~ YEAR
estimated for each demographic group.
Sorry about that, that makes sense! Updated to include all of this.
That's great, thank you!
Long-term trends in college education will be necessary for calibrating the college education model. These long-term trends should be disaggregated by sex, race/ethnicity, and age. One potential source of data is CPS ASEC.