Refine jobs demographic calculations

Current Approach

As of 3.3.0, the (beta) approach to allocating changes in jobs to demographic traits relies on national average data for each ISIC code. Essentially, we:

Take BLS data on the share of each ISIC code that is occupied by workers with each demographic trait in the present year
Turn it into a future time series by factoring in BLS projections of changes in demographic composition of the entire workforce over time (because BLS doesn't have projections of changes in the demographic composition of each ISIC code over time). We map workforce-wide demographic changes down onto individual ISIC codes using a careful methodology (involving S-curves to respect asymptotes at 0% and 100%).
We multiply the policy-driven change in jobs for each ISIC code by the share of each ISIC code with each demographic trait.

Flaws

This methodology isn't terrible when used at national scale to reflect broad economy-wide shifts (which is what the EPS is meant for), but there are some sources of inaccuracy. Here are a few concern:

Two different policies that grow the same ISIC code might create those jobs in different places (and hence, with different demographics). For example, a policy that causes refineries to be constructed might create construction jobs on the Gulf coast, while a policy that causes solar panels to be constructed might create jobs in the sunny Southwestern deserts. Since the demographics of construction workers are different on the Gulf coast and in Southwestern deserts, the demographics of the created jobs would be different for these two policies.
Aside from geography, there may be policy design features that can direct jobs toward or away from specific demographics, such as an explicit DEI focus in the policy. The EPS currently does not have a policy lever or levers to model DEI-focused policy features.
In the BAU case, certain industries will inevitably have demographic shifts that are not well-reflected by the demographic shifts in the overall workforce (mapped onto that industry). For example, an ISIC code today might be concentrated in places with specific demographics, but future growth of that ISIC code (even in the BAU case) might be located in places with very different demographics, which could shift the demographics of that ISIC code in ways different from the shift of the demographics of the total workforce. The same effect could happen for non-geographic reasons. For instance, suppose an ISIC Code has a concentration of workers with a particular demographic trait today - maybe due to immigrants being attracted to and welcomed into those businesses - but future growth of that industry may be less skewed toward that demographic, as people with that demographic trait become more settled in the country and the ISIC Code increasingly recruits from the general U.S. population instead of from that specific immigrant population.

It is not possible to predict future demographic breakouts of jobs perfectly, but there may be some steps we can take to reduce inaccuracy. There are a few approaches worth considering. They are not mutually exclusive.

Approach 1: Use County-Level Geography (or State-level, if county level is unavailable)

This approach does not use national trends to scale future changes. Instead, it looks at the demographic composition of each county (or state) and uses the changes in demographics in those counties (or states). It can optionally also address the possibility of industries changing their geographic focus in the future, if you have input data on that topic.

For each ISIC Code, obtain input data about the number of jobs (or amount of output) in that ISIC code in each county
Obtain Census data about the demographic makeup of each county (we already have these data in the EPS to help with public health impacts)
Weight each county's demographic percentages by the share of that ISIC code that occurs in that county. (This assumes an employer generally employs workers with demographics similar to those of the population in the county where the work is done.) Sum them up to get a national percentage of people with each demographic trait in that ISIC code.
Compare the national share obtained by weighted averaging of counties (above) to the known (100% accurate) present-day national share by ISIC Code from BLS. Adjust each county proportionately so that the weighted average of all counties equals the national share. For example, for a given ISIC code, the raw weighted average may be 13% Black, and the known national share for that ISIC Code may be 13.5% Black. Adjust all counties' shares of Black workers upward proportionately to their existing shares such that the weighted average comes out to be 13.5%.
If you have county-level projected changes in demographic makeup of the population, project changes in the demographic share of each ISIC code's workers in each county on the basis of these county-wide population changes. If you do not have county-level projected changes in demographic makeup, leave the present-day demographic shares of each ISIC code in each county constant.
If you have county-level projected changes in where ISIC codes will be concentrated in the future (i.e. in output or jobs), adjust the weights (the ISIC code's jobs or output) assigned to each county in each future year accordingly. If you do not have data on how industries' geographic focus will change in the future, leave present-day values in place.
In each modeled year, for each ISIC code, take a weighted average of the demographic shares of that ISIC code in each county, weighted by the concentration of that ISIC code in that county. The result is the national average share of that ISIC code with that demographic trait in that year.
When a policy package causes an increase (or decrease) in an ISIC code, multiply the change in jobs by the national average shares found in the step above.

Note that for the procedure above, if you have neither the optional input data requested in step 5, nor the optional input data requested in step 6, this approach probably does not add any value or accuracy relative to the current approach. (It might actually be less accurate, since it removes the scaling by shifts in national workforce makeup.) So you should likely consider having the data requested in steps 1, 2, and either 5 or 6 (ideally both 5 and 6) as prerequisites for using this method.

Approach 2: Use Projections of Demographic Changes by ISIC Code for Specific Cash Flow Types

Instead of using geography, this approach breaks apart the changes in cash flows going into each ISIC code to help estimate the demographics of the jobs created/lost due to those cash flow changes.

Each ISIC code in the I/O model receives a change in output caused by policies, from several sources (i.e. changes in industry spending, government spending, household spending). Within changes in industry spending, there are direct and indirect effects. We will likely leave the impacts of government and household spending (induced impacts) alone, and also likely leave alone business spending on inputs (indirect impacts).
For the direct impacts, for each ISIC code, divide the cash flows into meaningful buckets depending on where they came from. For example, construction of each type of power plant might be its own bucket. It may be structurally difficult to ensure all such cash flows are captured cleanly, since today they are summed on the sector-specific sheets, and the expenditures take diverse forms and use various different subscripts.
Obtain input data on the demographic shares of workers who would receive jobs due to particular types of cash flows. For example, the data should stipulate that for building solar plants, the resulting jobs nationally will be x% of each race, y% of each gender, z% of Hispanic or Latino ethnicity, and q% each age bracket. These data may be hard to find for most cash flow buckets. For any cash flow bucket where you don't have these data, assume job changes have demographics reflecting the average for that ISIC code. Similarly, for the induced and indirect spending, assume job changes have demographics reflecting the average for the ISIC code being spent on.
Multiply each cash flow change (including the induced/indirect ones) by the demographic trait percentages for that cash flow type. Divide by total cash flow change to get a weighted average. (Be careful here when handling negative cash flow changes.) This gives a percentage of the policy-driven change in jobs for each ISIC code that has each demographic trait.
Multiply the change in jobs by the demographic trait percentages found in the step above.

This approach requires structural work in Vensim because it adapts the demographic changes to the magnitudes of the cash flow changes of different types. This is different from Approach 1, which can be done entirely in Excel.

Both approaches can be used together, as they target different sources of uncertainty.

The structural work needed for approach 2 could be ugly when it comes to summing specific cash flow types from sectors. After much refinement, this is something we do within each sector, which helps us avoid dealing with different types of subscripts and formats in the Cross-Sector Totals page. We want to keep this paradigm to prevent the EPS from growing too complex to be understandable and debuggable. This means either not using Approach 2, or else bringing demographic subscripts into the mix at the stage of direct cash flow assignment to ISIC codes within each sector (i.e. at the same time as a direct expenditure in a sector is assigned to ISIC codes, it is simultaneously assigned demographic shareweights). The nationwide demographic averages by ISIC code are used for all cash flow changes to that ISIC code that were not assigned shareweights within the various sectors.

Approach 3: Add Policy Lever(s) to Affect Demographics Directly

This might be made its own GitHub issue. One or more policies should be added to directly alter the calculated demographic percentages of new jobs. A good way to do this is to provide a lever subscripted by ISIC code that alters the final (calculated) demographic trait percentages by user-specified amounts for that ISIC code. This lever should only affect the jobs created (or lost) due to policy, not the jobs that exist in the BAU scenario.

EnergyInnovation / eps-us