EnergyInnovation / eps-us

Energy Policy Simulator - United States
GNU General Public License v3.0
22 stars 7 forks source link

Disaggregate job impacts into demographic categories. #155

Closed robbieorvis closed 3 years ago

robbieorvis commented 3 years ago

This would disaggregate job impacts into different demographic categories, such as income, gender, and race or ethnicity.

We should explore what the right methodology is, but one example is to just maintain existing breakdowns by ISIC code and carry these forward, using data from BEA.

Original email:

You may remember that one of our EPS outputs is change in union jobs and change in non-union jobs. This is calculated based on the change in jobs in every ISIC code multiplied by the present-day share of jobs in that ISIC code that are represented by a union. It was a very straightforward and easy output to make (once we had the change in jobs by ISIC code).

Do you think it would be helpful to add similar outputs for:

• Change in jobs by income bracket • Change in jobs by gender • Change in jobs by race or ethnicity

In each of these cases, the only new input data we would need is the percentage of jobs in each ISIC code held by people of each type (each income bracket, each gender, etc.).

For the United States, I believe BLS already includes breakdowns of workers by income bracket and by gender, probably by industry. I don’t know if they offer race/ethnicity breakdowns. However, the U.S. Census tends to report lots of things with race/ethnicity breakdowns, and they might have data on workers by industry.

I haven’t looked for anything on this for non-U.S. countries. But this is just a new output, not needed for internal calculation or the accuracy of any other output. So if data cannot be found, it would be possible to simply disable the new output graphs in the Web App for a non-U.S. country.

If we add this feature, the new input variables likely should be time-series, so that people can simulate shifts in the future. For instance, today, most software engineers today are male, but I believe that is changing, with the female proportion growing every year. This might require additional data beyond what we can get from BLS and the Census (or we simply make assumptions) about how the percentages will change in the future.

More:

Well, we did have an outside expert review the methodology for union vs. non-union jobs, which is the same methodology as proposed to handle race/ethnicity and gender. (However, income is more complicated – see below.) That expert had no objections to the methodology we’re using for union membership. She did bring up the idea of the BAU input data file potentially projecting future changes in union representation percentage, but if you extrapolate from the past (which is the best methodology she could come up with to do this), you get the result that union membership declines to zero in every industry. We felt that was not realistic and also would send a negative political message that has nothing to do with energy policy or the policies in the simulator. So we rejected that suggestion and chose to use today’s constant shares of union vs. non-union jobs in each industry, which we felt produces the most accurate metric for policymakers.

If we do add these output graphs, we could simply use today’s shares of gender and race percentages by industry, exactly as we do for the union vs. non-union graph, and update the percentages as new data become available. (Also, we don’t have future projections of the key I/O variables that play into the jobs calculation, so keeping constant shares here would be in line with what we’re already doing in variables like DLIM and BObIC.) I notice the BLS has race/ethnicity and gender data by industry here, updated monthly, so these two graphs would be easy to put together in a matter of hours. In addition to a simple total, we might consider versions that normalize against the share of the population that has that trait (for example, “new Asian-held jobs per 100,000 Asian people in the population”) to understand whether jobs are disproportionately going toward any specific group. Otherwise, it might look like whichever group is in the majority is always the winner, even if they actually got fewer of the new jobs in percentage terms.

Unlike race/ethnicity/gender, income might not need new input data. We already have the total employment and the total employee compensation in every industry, so we can divide them to find the average compensation of a worker in each industry, then see if the industries with high-compensation-per-worker are gaining or losing jobs, and the same for industries with medium or low average compensation per worker. This isn’t a perfect measure of job quality – an industry that has 10 people earning $100,000 each would have the same mean compensation as an industry with 1 person earning $1,000,000 and 9 people earning $1. But in the real world, it might be a sufficiently good metric, since the industries that tend to have high income inequality between different roles within the company (say, discount retail chain CEO vs. store workers) tend to need large numbers of low-earning workers, which would pull down the mean. So having low average mean worker compensation might be a decent way to capture what we mean when we speak of “job quality” in the real world. But it is true that this adds a complicating factor to income outputs that we don’t have to deal with for race/ethnicity or gender outputs, so race/ethnicity/gender outputs might engender higher confidence.

Robbie response: I think these outputs would be very useful, but I am quite concerned about using the existing union methodology (or even something similar) to estimate this without doing a bit more research first. It’s true that our reviewer didn’t have any criticism, but that’s different than an endorsement, and it also doesn’t necessarily extend to the new metrics we are looking at. These outputs would be highly scrutinized since they are in such high demand and are so politically sensitive, so I’d want to make sure we feel really good about our methodology and that is very defensible.

Before heading down this path, I’d like us to research how other economists and economic models estimate these things, to see if we gain some insight there and possibly replicate methodologies.

One example is the employment methodology from the Princeton Net Zero America study: https://netzeroamerica.princeton.edu/img/Annex%20R.%20(NZA)%20Labor%20transitions%20methodology%20draft%202-17-21.pdf

Here, it does look like the modelers either held historical shares constant or projected future changes based on historical trends. They didn’t measure all the things we are talking about here, but they did measure some of them (as an aside, Jeff, I think you should read the linked appendix because it may generate some ideas on cool new features/improvements for our employment estimates).

jrissman commented 3 years ago

Okay, it is difficult to find sources on other places that have disaggregated employment impacts by race or gender, which adds to my sense that providing this capability in the EPS provides real value that is hard to come by elsewhere.

The best source I've found is from the BLS, which does project employment forward for 10 years into the future disaggregated by demographic factors including race and gender, using a computer model. Their description of the methodology can be found in:

“Employment projections,” Handbook of Methods (U.S. Bureau of Labor Statistics), https://www.bls.gov/opub/hom/emp/pdf/emp.pdf

The relevant section is on pages 8-9, which I will quote here:

Labor force

Projections of the future supply of labor are calculated by applying BLS labor force participation rate projections to population projections produced by the Census Bureau. The Census Bureau carries out long-term projections of the resident U.S. population. The projection of the resident population is based on the current size and composition of the population and includes assumptions about future fertility, mortality, and net international migration. BLS analysts then convert the resident population concept of the decennial census to the civilian noninstitutional population concept of the BLS Current Population Survey (CPS). This takes place in three steps. First, the population of children under age 16 is subtracted from the total resident population. Then, the population of the Armed Forces, by age, gender, race and ethnic categories, is subtracted out. Finally, the institutional population is subtracted from the civilian population for all the different categories.

BLS maintains a database of annual averages of labor force participation rates provided by CPS for various age, gender, race and ethnic groups. BLS analysts examine trends and the past behavior of participation rates for each of the categories. This is accomplished by first smoothing the historical participation rates for these groups. Next the smoothed data are transformed into logits, or the natural logarithm of the odds ratio. Then, the logits of the participation rates are extrapolated linearly by regressing them against time and then extending the fitted series to or beyond the target year. When the series are transformed back into participation rates, the final projected paths are nonlinear.

In addition, projected labor force participation rates are reviewed for consistency. Reviews are conducted on the time path, the cross section in the target year, and the cohort patterns of participation, and, if necessary, modifications are made. Projected labor force participation rates are then applied to the projected civilian noninstitutional population, producing labor force projections for each of the age, gender, race, and ethnic groups. Finally, these groups’ values are summed to obtain the total civilian labor force, a key input into projecting the macroeconomy, which is the next step.

I don't think we need to recreate this methodology, because BLS has done it for us. I just think we should take BLS's projections and extend forward to 2050 using an intelligent curve fit. We can flag in the documentation somewhere, if we want, that the methodology becomes less certain the farther out we go, especially after 10 years out, when we have to start extending the BLS data via curve fit.

The BLS has annual data series at https://www.bls.gov/emp/data/labor-force.htm

I'm going to continue working on this today and hopefully will build a working version.

jrissman commented 3 years ago

Also, here's a link to the sex and race breakdown table by industry from the BLS CPS: https://www.bls.gov/cps/cpsaat18.htm

We still need a source like this to apportion changes by industry

jrissman commented 3 years ago

Completed in commits 8352971, c4f3b0e, c7a034b, 3e66d71.

We have four types of demographic traits we handle: sex, race, Hispanic or Latino status, and age bracket. I know age bracket wasn't initially part of this, but the BLS had the data in essentially the same format, so as long as I was doing the work to add the others, I figured I might as well include age brackets almost "for free."

It's running right now on the staging server in the "develop" branch if you want to check it out.