PrincetonUniversity / gerrymandertests

Lots of metrics for quantifying gerrymandering.
GNU General Public License v3.0
25 stars 9 forks source link

First Notebook example doesn't work: apparently expects a state data file to already be there? #2

Open akkana opened 4 years ago

akkana commented 4 years ago

I'm trying to run the gerrymandertests, but apparently it relies on my separately downloading state-specific files (I'm particularly interested in New Mexico) and I can't find any documentation on where to get them.

If I just run the notebook, here's the error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-54dcfe840d25> in <module>
     41 
     42 for chamber in chambers:
---> 43     chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
     44     chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
     45         chambers[chamber]['elections_df'],

~/outsrc/gerrymandertests/gerrymetrics/utils.py in parse_results(input_filepath, start_year, coerce_odd_years)
     12     '''
     13 
---> 14     df = pd.read_csv(input_filepath)
     15 
     16     df = df[df['Year'] >= start_year]

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    674         )
    675 
--> 676         return _read(filepath_or_buffer, kwds)
    677 
    678     parser_f.__name__ = name

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    446 
    447     # Create the parser.
--> 448     parser = TextFileReader(fp_or_buf, **kwds)
    449 
    450     if chunksize or iterator:

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    878             self.options["has_index_names"] = kwds["has_index_names"]
    879 
--> 880         self._make_engine(self.engine)
    881 
    882     def close(self):

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1112     def _make_engine(self, engine="c"):
   1113         if engine == "c":
-> 1114             self._engine = CParserWrapper(self.f, **self.options)
   1115         else:
   1116             if engine == "python":

~/pythonenv/gerry/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1889         kwds["usecols"] = self.usecols
   1890 
-> 1891         self._reader = parsers.TextReader(src, **kwds)
   1892         self.unnamed_cols = self._reader.unnamed_cols
   1893 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File election_data/state_legislative/state_legislative_election_results_post1971.csv does not exist: 'election_data/state_legislative/state_legislative_election_results_post1971.csv'

election_data/congressional_election_results_post1948.csv comes as part of the repository, but election_data/state_legislative/ is an empty directory. Where can I get the files that it expected there?

In NM we're actively fighting for better redistricting (I'm webmaster for fairdistrictsnm.org) and I'd love to get some quantitative measurements I could show to legislators and display on the website.

hjohns12 commented 4 years ago

Hi @akkana, the file you're looking for is here: https://github.com/PrincetonUniversity/historic_state_legislative_election_results/blob/2bf28f2ac1a74636b09dfb700eef08a4324d2650/state_legislative_election_results_post1971.csv

I'll update the notebook to update the file path to this data set!

akkana commented 4 years ago

Thanks! I downloaded that and put it in election_data/state_legislative and got past that error. Now it's dying with a different error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-54dcfe840d25> in <module>
     41 
     42 for chamber in chambers:
---> 43     chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
     44     chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
     45         chambers[chamber]['elections_df'],

~/outsrc/gerrymandertests/gerrymetrics/utils.py in parse_results(input_filepath, start_year, coerce_odd_years)
     34     new['District Numbers'] = grouped['District'].apply(list)
     35 
---> 36     if df.columns.contains('Dem Votes'):
     37         new['Weighted Voteshare'] = grouped['Dem Votes'].apply(sum) / (grouped['Dem Votes'].apply(sum) +
     38                                                          grouped['GOP Votes'].apply(sum))

AttributeError: 'Index' object has no attribute 'contains'
akkana commented 4 years ago

I realized that was with the pip install gerrymetrics; but I tried pip uninstall gerrymetrics followed by pip install . from the checked-out code, and got the same error. If it matters, this virtualenv's pandas reports version 1.0.1 (Python version 3.7.5).

hjohns12 commented 4 years ago

Hi @akkana,

I tried to reproduce your issue but was not able to do so. I created a virtual environment (python version 3.7.4) and successfully installed gerrymetrics just now. I wonder if your issue is coming up because your version of pandas does not agree with the version of pandas automatically installed by this package.

What I recommend is that you create a virtual environment, and before installing any other packages, install gerrymetrics with the following code:

python3 -m venv install_ve source install_ve/bin/activate pip install gerrymetrics

Let me know if that works, thanks so much!

akkana commented 4 years ago

I get exactly the same error as before when I type those three lines followed by jupyter-notebook run_gerrymandering_metrics.ipynb I tried it outside of jupyter-notebook and got the same error, still AttributeError: 'Index' object has no attribute 'contains'

akkana commented 4 years ago

If I edit utils.py and put double underscores at eiither end of the "contains" in the line that's erroring (I can't illustrate that because apparently double underscores have a meaning in markdown) in parse_results(), I get a little farther and it even appears to download something (some data?), but then it dies with

  File "<stdin>", line 6, in <module>
  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 66, in tests_df
    df = yearstatedf()
  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 55, in yearstatedf
    names=['Year', 'State'])
TypeError: __new__() got an unexpected keyword argument 'labels'

(I should warn you my utils.py line numbers will be a little off because I've inserted some print()s). And that does look like a Pandas difference, since the line with the error is creating a pd.MultiIndex with labels as a keyword arg.

This is Python 3.7.5 on Ubuntu 19.10, so probably the pandas the virtualenv is pulling in is tied to that. pandas double-underscore version is 1.0.3.

hjohns12 commented 4 years ago

@akkana I just pushed some code that updates the pandas syntax and data path. Will you try cloning again with the updated code and run in a virtual environment with:

python3 -m venv install_ve
source install_ve/bin/activate
pip install gerrymetrics
jupyter-notebook run_gerrymandering_metrics.ipynb

Thanks so much!

akkana commented 4 years ago

Sorry for the delay, I've been super busy with election stuff.

Following those instructions (after git pull in the gerrymandertests repo) gives this mysterious error:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-4-9649b5edd3ef> in <module>
----> 1 import gerrymetrics as g
      2 import IPython.display as ipd
      3 
      4 from collections import defaultdict
      5 

~/outsrc/gerrymandertests/gerrymetrics/__init__.py in <module>
----> 1 from .metrics import *
      2 from .plots import *
      3 from .utils import *

~/outsrc/gerrymandertests/gerrymetrics/metrics.py in <module>
     11 from __future__ import division  # for python 2
     12 import numpy as np
---> 13 import scipy.stats as sps
     14 
     15 

ModuleNotFoundError: No module named 'scipy'

It's mysterious because clearly scipy is there; if I run python inside the venv and run import scipy.stats as sps, it works fine. But it doesn't work inside the notebook.

Aha: that's because Ubuntu's jupyter-notebook begins with: #!/usr/bin/python3. So I ran a pip install jupyterlab, then ran install_ve/bin/jupyter-notebook run_gerrymandering_metrics.ipynb That gets me past the import error and now it dies with:

Traceback (most recent call last):

  File "/home/akkana/outsrc/gerrymandertests/install_ve/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-2-9649b5edd3ef>", line 1, in <module>
    import gerrymetrics as g

  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/__init__.py", line 3, in <module>
    from .utils import *

  File "/home/akkana/outsrc/gerrymandertests/gerrymetrics/utils.py", line 37
    if 'Dem Votes' in df.columns:
    ^
IndentationError: unexpected indent

Sure enough, that line is indented more than the lines before it. If I fix the indentation, I get a little farther:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-9649b5edd3ef> in <module>
     39     print(chamber)
     40     chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
---> 41     chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
     42         chambers[chamber]['elections_df'],
     43         impute_val=impute_val,

~/outsrc/gerrymandertests/gerrymetrics/utils.py in tests_df(tests_dict)
     63     '''
     64 
---> 65     df = yearstatedf()
     66 
     67     for year in tests_dict:

~/outsrc/gerrymandertests/gerrymetrics/utils.py in yearstatedf()
     50     '''
     51 
---> 52     index = pd.MultiIndex(levels=[[], []],
     53                           labels=[[], []],
     54                           names=['Year', 'State'])

TypeError: __new__() got an unexpected keyword argument 'labels'

so alas, now I'm just back to the error from two weeks ago.