AaronGullickson / panethnicity_intermar

Data for "Patterns of Panethnic Intermarriage in the United States, 1980-2018" forthcoming in Demography
MIT License
0 stars 0 forks source link

Check for issues with language coding #17

Closed AaronGullickson closed 3 years ago

AaronGullickson commented 3 years ago

I need to do a deeper dive into the difference between language and languaged to make sure I am capturing the relevant categorization whenever possible.

AaronGullickson commented 3 years ago

Ok, I have looked through this and the general codes are definitely not good enough. They provide a lot of detail for European languages but then lump vast geographic regions together (e.g. "Sub-Saharan African").

The detailed codes work well in many cases, although the 1980 data often provide too much detail because nothing was re-coded from what respondents provided. Ideally, I would also want to look at a linguistic measure of language similarity but this would be quite an undertaking.

I think what I will do is follow a procedure of using the detailed codes with adjustments made to make the two time period more comparable. Specific changes as follows.

AaronGullickson commented 3 years ago

In general, I will use the detailed codes with the following adjustments:

There is also the problem of groupings of "other" or "nec" languages. These should all probably be given one code and not be considered endogamous with anything else. These would be the following codes: