Some remarks - Githubissues

NYU-MSDSE-SWG / voter-ethnicity-inference

2 stars 3 forks source link

Some remarks #1

Open amueller opened 9 years ago

amueller commented 9 years ago

It would be nice if there were instructions on how to run the experiment from Imai and Khanna (without party affiliations) and what experiment in the paper it corresponds to.
Currently ethnic_predict.py used test_input.csv which is not checked in.
you should enable flake8 in your editor. there are undefined inputs and it would be nice if you loosely coded in pep8 style. (there is autopep8 to fix silly issues like whitespace)
it would be nice if you could document using numpydoc style (I think you use doxygen currently?) see https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
some of the functions don't have documentation yet.

I'll submit a PR with some style changes and then look at the math ;)

amueller commented 9 years ago

Also, create_location_prob seems a bit unnecessary.

amueller commented 9 years ago

It would be nice to compare "name only" with "name and group" as done in the paper, and reproduce the plots in figure 1.

amueller commented 9 years ago

I think you can write things like

        voter_file.loc[voter_file['race'] == 7, 'race'] = 6                                            
        voter_file.loc[voter_file['race'] == 1, 'race'] = 6                                            
        voter_file.loc[voter_file['race'] == 9, 'race'] = 6

voter_file.race = voter_file.race.replace({7: 6, 1: 6, 9: 6})

which is shorter, quicker and slightly more readable.

amueller commented 9 years ago

When you call predict_ethnic the parameters lastname and cbg2000 come from the same dataframe. Then you check them in validate_input. Why? You should just pass the dataframe, and would be sure they have the same length. Or did you call predict_ethnic from some other place?

twangnyc commented 9 years ago

@amueller I split the whole dataframe to lastname and cbg2000 to make it easy to test and play with. I can just come up with a pair of 'fake' name and 'cbg2000' and put them into the program to see what the result will be. For create_location_prob I will remove it. Also I will reproduce figure 1 as soon as I finish cleaning census block data. Thanks for the comment! :)

amueller commented 9 years ago

Thanks :) I'm still reading.