AaronGullickson / panethnicity_intermar

Data for "Patterns of Panethnic Intermarriage in the United States, 1980-2018" forthcoming in Demography
MIT License
0 stars 0 forks source link

Specific nation-origin group model #3

Closed AaronGullickson closed 3 years ago

AaronGullickson commented 3 years ago

I don't know if I can pull it off for 1980, but I would like to do a model using ACS data that looks at specific affinities within the various national-origin groups (e.g. Japanese and Korean, Mexican and Dominican).

AaronGullickson commented 3 years ago

Pan-Ethnicity and Intermarriage

AaronGullickson commented 3 years ago

Ok, I can fit full models to both the ACS and 1980 data, but I get warnings about convergence failure and infinite coefficients for a few groups with zero intermarriages. They are not infinite, but they are giant coefficients. These are:

ACS: Vietnamese/Dominican 1980: 10 dfifferent cases (didn't get names on last run)

So I could work with this, but it occurs to me that a modified models will run better in both cases, and capture most of what I need. I can fit all of the specific ethnicity-to-ethnicity terms within the Asian and Latino groups, but use pan-ethnicity to model intermarriage between other racial pentagon groups. My only concerns with this model are:

  1. I may want to do a separate Filipino/Latino dummy variable to capture possible affinities there.
  2. I may want to do separate dummy variables with Latino ethnicity and Black to capture possible things going on there with Afro-Latinos.
AaronGullickson commented 3 years ago

Ok, so I created a race_exog_extended variable that uses specific ethnic groups for Asian/Asian, Latino/Latino, and Black/Latino exogamy. I also create a separate dummy variable for Filipino/Latino intermarriages. It runs pretty quickly.

I will now run a parallel set of models using this dummy coding. I need to get my terminology down for labeling model output. The basic structure will be model_dataname_racecode_modeltype with the following categories.

dataname

racecode

modeltype

That should give me 24 different models in total. Using the efron technique I think it will be possible to estimate these models relatively quickly, although I am not doing multiple complete datasets yet.

AaronGullickson commented 3 years ago

I also wonder whether with this new coding of the extended models, I could add more ethnic groups into the mix for the ACS data? The only real requirement is that they have no zero cases in the Latino/Asian ethnicity combo. So it may be worth exploring a bit.

AaronGullickson commented 3 years ago

Note that because I am converting race variables to factors differently now that will affect all the stuff in analysis.Rmd which will need to basically be rewritten.

AaronGullickson commented 3 years ago

So, I decided to try adding more Latino and Asian ethnic groups and ended up with 15 Latino groups, 8 asian groups, and 3 south asian groups . That however ended up being way too much for the estimation procedure. It was something like 200 racial exogamy coefficients to estimate in addition to other parameters for the full ACS data.

So, I have pruned the list of Latino categories down to 11, removing the four smallest categories. That should reduce that group from 105 parameters to 55 parameters. Its going to take awhile to rerun everything but hopefully that model will be able to run.

AaronGullickson commented 3 years ago

Ok, I have finally settled on a set of models that allow me to parsimoniously look at this. I have full markets for the ACS data with all identifiable ethnic groups, but I only run pentagon style models on these datasets because of too much spareness and too many parameters at the specific ethnicity level. This allows me to see if widening the criteria for Asian and Latino groups changes the simple estimate of Asian/Latino ethnic exogamy in the ACS data. The general result is that it leads to a bit more exogamy but not that much difference.

I then run a "restricted" model that uses fewer of these Latino/Asian groups to look at specific ethnic combos. This is all dictated by group size and what works in terms of model fitting. I found that moving beyond the five Asian categories does not work as the sample sizes are so small that there is a lot of noise. I was able to use 9 Latino groups. From these models, I can do the whole tile map/dendrogram thing.

Right now I have the latest iteration of this running, but the code is all set up so I will close as soon as the models.RData and analysis.html are done.

AaronGullickson commented 3 years ago

Commit 638a721dea437f8940c38d6a754b00c61e77dc89 has the models and model output to close this issue.