bdecon / econ_data

Python 3 examples of using economic data APIs and working with economic microdata. Includes bd CPS.
72 stars 37 forks source link

bd CPS: possible addition of O*Net information on physically demanding jobs #87

Closed bdecon closed 3 years ago

bdecon commented 5 years ago

I'd love to know which jobs are physically demanding and which aren't. This can be done, from what I understand, by matching occupation codes with data contained in the O*Net resource. As a low priority/placeholder, look into this.

bdecon commented 5 years ago

See:

Bucknor and Baker (2016)

Rho (2010)

Johnson, Mermin, Resseger (2007)

O*Net database

In terms of implementation, it's easy to convert the 4-digit occupation codes in the CPS to the SOC codes. I know the SOC Codes match with ONet, generally, though in some cases it looks like the ONet occupations are more specific than the CPS occupations, which could be an issue.

It seems like a dictionary of CPS occupations and their ONet equivalents could then be used to identify job characteristics. The CEPR papers list which characteristics are defined as physically demanding, or difficult working conditions. From there it would just be about identifying the list of CPS occupations that are PD or DWC.

I'm sure there's more to it than this, but so far it doesn't seem like it wouldn't be a huge lift to do it for one year or one month of data. A bigger challenge might be caused by revisions to the occupation codes and revisions to O*Net data, both past and future. For this reason, it might not be feasible to have a DWC and PD variable for all years, or to have it generated automatically in the future. That is, I could use the latest ONet database to know which jobs are currently PD or DWC but it would be harder to know which jobs in 1995 were PD or DWC, especially in a meaningful way that would allow me to identify trends.

bdecon commented 5 years ago

From Hye Jin:

http://economics.mit.edu/faculty/dautor/data/acemoglu

Crosswalks:

https://www.onetcenter.org/crosswalks.html#soc

http://ibs.org.pl/en/resources/occupation-classifications-crosswalks-from-onet-soc-to-isco/

bdecon commented 5 years ago

Potentially promising: O*Net API: https://services.onetcenter.org/reference/online/occupation

EDIT: Requires registration that would make it hard to replicate--probably not the right direction.

bdecon commented 5 years ago

Database archive: https://www.onetcenter.org/db_releases.html

bdecon commented 5 years ago

From Rho (2010):

Selected job characteristics from O*NET are used to define jobs that are physically demanding or have difficult working conditions. Jobs are considered to be highly physically demanding if they involve any of the following elements: dynamic strength, explosive strength, static strength, trunk strength, bending or twisting, kneeling or crouching, quick reaction time, or gross body equilibrium. In addition to these measures, if jobs involve performing more general physical activities, handling and moving objects, or demand workers to spend significant time standing, walking and running, or making repetitive motions, they are considered as having any physical demand. Difficult working conditions are defined as cramped workspace, labor outdoors (exposed to the weather or covered) or indoors in not environment-controlled conditions, or exposure to abnormal temperatures, contaminants, hazardous conditions, hazardous equipment, or distracting or uncomfortable noise.

bdecon commented 5 years ago

From Johnson, Mermin, Resseger (2007) the definition for a category seems to be cases where the importance of a skill or activity is equal to 4 or above on the five-point scale.

Seemingly, the way to do this is to use the O*Net text files to identify which job codes are physically demanding (PD) in one notebook, and to store the results as a dictionary that will be used to map CPS SOCs to either PD == 1 or (if missing) PD == 0.

*The problem here is going to be that some ONet job codes are more detailed than the CPS codes.** In other words, for some CPS jobs, a portion of them will presumably be PD and a portion will not. But this creates a pretty big problem because 1) I do not know what portion of the CPS respondents are in each job subcategory, and 2) I want to add this information to the bd CPS microdata in a meaningful way, not just to derive summary statistics.

To handle the above issue, I'll have to identify exactly how common the problem is in the specific cases of interest. So if there are three job subcategories that I can't identify in the CPS and only one is PD, then I'm in trouble, but if 2 are PD and the third is 3/5 in one category, then maybe it's close enough. I am really hesitant to randomly assign PD to 2/3 of the CPS observations with the broader job category.

Then separately, I'll need to look into how to build a time series by matching previous SOC codes to O*Net data.

bdecon commented 5 years ago

https://www2.census.gov/programs-surveys/cps/methodology/Occupation%20Codes.pdf

bdecon commented 5 years ago

Identifying which O*NET SOC codes are PD was easy, but there's no easy way to add a PD variable to the microdata. The closest option I can think of so far is something like a variable that is equal to 1 if the Census job code is PD at every sublevel, equal to 0 if the job code is not PD at any sublevel, and otherwise equal to the share of sublevels that are pd (for example 0.25 if 1/4 of the categories are PD). The obvious problem here is that a 1/4 of the categories could be PD but that doesn't say anything about what share of jobs are PD. The one PD category could be 98% of the jobs in the category or 2%.

The issue with a PD variable where the tricky cases are handled by random assignment is that it obviously creates many many wrong microdata observations. Some PD jobs will be 0 and some non-PD jobs will be 1. The original approach in Rho (2010) for example, seemingly used multi-stage random assignment. I'd guess something like making sure the total is correct for several age and gender subgroups of older workers. But I can't replicate that on the entire CPS, since there are thousands of possible subgroups.

And even if I could get a pretty accurate estimate of what percentage of each category each subcategory comprises, which would be better, it would be a point-in-time estimate applied to a long-term time series. As previous research indicates, the approach is not intended to show changes over time because O*Net doesn't really measure how new equipment or new techniques make a job more or less physically demanding.

All that said, the "random assignment" problem is somewhat limited to begin with, as many cases are 0 or 1 and not something in between.

bdecon commented 5 years ago

One corner case to think about: Computer occupations has 13 subcategories in the more detailed O*Net occupational classification. Only one of the subcategories, GIS occupations, gets flagged as PD. In that one case, I'd argue, sure, maybe it's PD, but not necessarily, and it's a small percentage of computer occupations.

It gets flagged because it gets a 4.0/5.0 on Spend Time Making Repetitive Motions.

bdecon commented 5 years ago

Created a new notebook that identifies the PD jobs and starts to analyze corner cases for further decisions to be made later. https://github.com/bdecon/econ_data/blob/master/bd_CPS/bd_CPS_ONET.ipynb

bdecon commented 4 years ago

See steps here: https://www.pewsocialtrends.org/2020/01/30/methodology-28/

bdecon commented 3 years ago

Moved to micro, putting on back burner. So many jobs changed in the past year that this is sort of less relevant.