datadesk / python-elections

A Python wrapper for the Associated Press' U.S. election data service.
python-elections.rtfd.org
176 stars 46 forks source link

MA race has wrong list of reporting units #95

Open gabrielflorit opened 10 years ago

gabrielflorit commented 10 years ago

At the moment, AP has one race worth of test data, House District 5, which is about ~20 or so towns, but doing

ma = client.get_state('MA')
race = ma.races[0]
reporting_units = race.reporting_units

returns a collection of all 352 MA towns. Some of them have votes, some don't. How can I help?

gabrielflorit commented 10 years ago

This might be due to https://github.com/datadesk/python-elections/blob/master/elections/ap.py#L494:

for r in ru_list:
  # if `st_postal` is in the dict, we're getting Top of the Ticket data,
  # so we want to put reportingunits in the state they belong to.
  # otherwise stuff the RUs into all of the races, as they're all in the same state.
  <code here>

In MA, not all reporting units are in all races. U.S. House District 5 only has its 20 or so towns, so that race should only have a limited number of RUs. Correct? Or am I missing something in the implementation?

gabrielflorit commented 10 years ago

I also don't see any reference to ftp://electionsonline.ap.org/inits/STATE/STATE_rura.txt in ap.py. If I understand correctly, this file maps reporting units to races. For example, here's ftp://electionsonline.ap.org/inits/MA/MA_race.txt - it contains 1 race:

| ra_number  |of_number  |se_number  |race_id|office_id|ra_num_winners|rt_number  |rt_party_name|of_description  |ot_number  |ot_name |se_name |of_scope|race_order |el_date |ra_uncontested|ra_tabulate|st_postal|ra_national_b|
|  24356|  22006|  5|G  |H| 1|  7| ||  3|U.S. House  |District 5  |L   |400|20131210| 0|  1|MA   |1|

And ftp://electionsonline.ap.org/inits/MA/MA_rura.txt - it contains several reporting units mapped to the abovementioned race:

| ru_number  |ra_number  |rura_tot_pcts|
|  1|  24356|  243|
|  22010|  24356|   21|
|  22014|  24356|5|
|  22026|  24356|8|
|  22049|  24356|   17|
|  22100|  24356|   18|
|  22136|  24356|4|
|  22155|  24356|9|
|  22157|  24356|2|
|  22165|  24356|   16|
|  22176|  24356|   16|
|  22178|  24356|   14|
|  22198|  24356|   10|
|  22248|  24356|   21|
|  22269|  24356|1|
|  22276|  24356|3|
|  22284|  24356|7|
|  22288|  24356|5|
|  22308|  24356|   18|
|  22314|  24356|   12|
|  22315|  24356|4|
|  22330|  24356|4|
|  22344|  24356|8|
|  22346|  24356|6|
|  22347|  24356|   14|
palewire commented 10 years ago

Just seeing this ticket now. Sorry for the delay. I'll take a look when I get into the office later today.

palewire commented 10 years ago

I think there are at least two issues here:

At the moment, a script like the following will filter reporting units to those where votes_cast is greater than zero but there is no adequate way of separating those that are legit null values.

# Log in
from elections import AP
client = AP("username", "password")
# Pull California results
ca = client.get_state('CA')
# Filter down to the House race on the Westside of Los Angeles to replace Waxman
race = ca.filter_races(name="U.S. House District 33 - Santa Monica")[0]
# Print out how many reporting units it has
ru = race.reporting_units
print race
# This will return all 58 California counties plus the 1 statewide reporting unit
print "reporting_units: %s" % len(ru)
# This will return only the 58 counties
print "counties: %s" % len(race.counties)
# Now if I filter the reporting units down to only those that have votes cast
# this will print only the statewide reporting unit and the Los Angeles County reporting unit
with_votes = [r for r in ru if r.votes_cast > 0]
print "With results: %s" % len(with_votes)
for ru in with_votes:
    print "- %s" % ru

Here's what that prints out:

U.S. House District 33 - Santa Monica
reporting_units: 59
counties: 58
With results: 2
- California (state)
- Los Angeles

That's obviously a hole. It would be nice to only get the valid one back for U.S. House and statehouse legislative races. I think you have two options:

  1. The bug fix: We figure out how to validate the reporting unit list for each race as you propose and patch the library
  2. The workaround: If you only need top-level results for your legislative races and don't need to break out or visualization the different results by town or county, you could ignore the reporting_units method and pull the results directly from the state method, which returns the summary horse-race totals. That's what we do for our stuff and why this need you've articulated has never come up for us. A script that did that for you would be look more like:
from elections import AP

client = AP("username", "password")
ca = client.get_state('CA')
race = ca.filter_races(name="U.S. House District 33 - Santa Monica")[0]
print race.state.results

Here's what that prints out

[<Result: Elan Carr, California (state), 17904>, <Result: Ted Lieu, California (state), 15870>, <Result: Wendy Greuel, California (state), 13976>, <Result: Marianne Williamson, California (state), 10789>, <Result: Matt Miller, California (state), 9973>, <Result: Lily Gilani, California (state), 5831>, <Result: Barbara Mulvaney, California (state), 1945>, <Result: Kevin Mottus, California (state), 1935>, <Result: David Kanuth, California (state), 1177>, <Result: Kristie Holmes, California (state), 770>, <Result: Mark Herd, California (state), 674>, <Result: Michael Sachs, California (state), 552>, <Result: Michael Shapiro, California (state), 530>, <Result: Tom Fox, California (state), 397>, <Result: Zein Obagi, California (state), 360>, <Result: Vince Flaherty, California (state), 258>, <Result: James Graf, California (state), 245>, <Result: Brent Roske, California (state), 135>]
gabrielflorit commented 10 years ago

Thanks for this super-detailed reply. Since I do want to visualize races by smallest geographical entity, I think the best option would be add a bit of code that uses the ftp://electionsonline.ap.org/inits/MA/MA_rura.txt dictionary for NE states. I'll see about doing that in the coming weeks.

palewire commented 10 years ago

Okay. I'll be interested in what you come up with. Though be sure the work is necessary and you can't just skate by looping through the races and using their global state results while ignoring all the rest of the reporting units.

palewire commented 10 years ago

Any luck with this?

gabrielflorit commented 10 years ago

Sorry - not yet.