NYU-MSDSE-SWG / voter-ethnicity-inference

2 stars 3 forks source link

Some remarks #1

Open amueller opened 9 years ago

amueller commented 9 years ago

I'll submit a PR with some style changes and then look at the math ;)

amueller commented 9 years ago

Also, create_location_prob seems a bit unnecessary.

amueller commented 9 years ago

It would be nice to compare "name only" with "name and group" as done in the paper, and reproduce the plots in figure 1.

amueller commented 9 years ago

I think you can write things like

        voter_file.loc[voter_file['race'] == 7, 'race'] = 6                                            
        voter_file.loc[voter_file['race'] == 1, 'race'] = 6                                            
        voter_file.loc[voter_file['race'] == 9, 'race'] = 6

as

voter_file.race = voter_file.race.replace({7: 6, 1: 6, 9: 6})

which is shorter, quicker and slightly more readable.

amueller commented 9 years ago

When you call predict_ethnic the parameters lastname and cbg2000 come from the same dataframe. Then you check them in validate_input. Why? You should just pass the dataframe, and would be sure they have the same length. Or did you call predict_ethnic from some other place?

twangnyc commented 9 years ago

@amueller I split the whole dataframe to lastname and cbg2000 to make it easy to test and play with. I can just come up with a pair of 'fake' name and 'cbg2000' and put them into the program to see what the result will be. For create_location_prob I will remove it. Also I will reproduce figure 1 as soon as I finish cleaning census block data. Thanks for the comment! :)

amueller commented 9 years ago

Thanks :) I'm still reading.