Open amueller opened 9 years ago
Also, create_location_prob
seems a bit unnecessary.
It would be nice to compare "name only" with "name and group" as done in the paper, and reproduce the plots in figure 1.
I think you can write things like
voter_file.loc[voter_file['race'] == 7, 'race'] = 6
voter_file.loc[voter_file['race'] == 1, 'race'] = 6
voter_file.loc[voter_file['race'] == 9, 'race'] = 6
as
voter_file.race = voter_file.race.replace({7: 6, 1: 6, 9: 6})
which is shorter, quicker and slightly more readable.
When you call predict_ethnic
the parameters lastname
and cbg2000
come from the same dataframe. Then you check them in validate_input
. Why?
You should just pass the dataframe, and would be sure they have the same length.
Or did you call predict_ethnic
from some other place?
@amueller I split the whole dataframe to lastname and cbg2000 to make it easy to test and play with. I can just come up with a pair of 'fake' name and 'cbg2000' and put them into the program to see what the result will be. For create_location_prob
I will remove it. Also I will reproduce figure 1 as soon as I finish cleaning census block data. Thanks for the comment! :)
Thanks :) I'm still reading.
ethnic_predict.py
usedtest_input.csv
which is not checked in.I'll submit a PR with some style changes and then look at the math ;)