getmovement / deprecated-movement-rails-api

DEPRECATED Rails API for getmovement.org
1 stars 1 forks source link

Determine how to geocode voter addresses #58

Open joshsmith opened 8 years ago

joshsmith commented 8 years ago

We need to determine how we're going to geocode voters' addresses.

joshsmith commented 8 years ago

Here's a table that shows estimated voter file costs, estimated number of addresses in those states, and then the cost to geocode 100% through Google (without Premium pricing) and the cost to geocode 14% through Google (optimistic best case of using SmartyStreet's 86% Zip9 addresses).

State Voter File Cost Est. # of Addresses Cost 100% Cost 14%
Alabama $30,000 1,838,683 $919.34 $128.71
Alaska $20 251,899 $125.95 $17.63
Arizona $32,500 2,370,289 $1,185.14 $165.92
Arkansas $2.50 1,129,723 $564.86 $79.08
California $30 12,542,460 $6,271.23 $877.97
Colorado $1,000 1,977,591 $988.80 $138.43
Connecticut $300 1,355,849 $677.92 $94.91
Delaware $10 335,707 $167.85 $23.50
Florida $5 7,158,980 $3,579.49 $501.13
Georgia $5,500 3,518,097 $1,759.05 $246.27
Hawaii $500 449,771 $224.89 $31.48
Idaho $20 579,797 $289.90 $40.59
Illinois $500 4,772,723 $2,386.36 $334.09
Indiana $5,000 2,481,793 $1,240.90 $173.73
Iowa $1,950 1,226,547 $613.27 $85.86
Kansas $200 1,110,440 $555.22 $77.73
Kentucky $450 1,694,996 $847.50 $118.65
Louisiana $5,000 1,707,852 $853.93 $119.55
Maine $2,200 553,823 $276.91 $38.77
Maryland $125 2,146,240 $1,073.12 $150.24
Massachusetts $0 2,530,147 $1,265.07 $177.11
Michigan $22.50 3,823,280 $1,911.64 $267.63
Minnesota $51 2,107,232 $1,053.62 $147.51
Mississippi $2,100 1,088,073 $544.04 $76.17
Missouri $100 2,360,131 $1,180.07 $165.21
Montana $5,000 405,525 $202.76 $28.39
Nebraska $500 725,787 $362.89 $50.81
Nevada $500 999,016 $499.51 $69.93
New Hampshire $8,200 518,245 $259.12 $36.28
New Jersey $2.55 3,186,418 $1,593.21 $223.05
New Mexico $4,800 761,938 $380.97 $53.34
New York $0 7,234,743 $3,617.37 $506.43
North Carolina $0 3,715,565 $1,857.78 $260.09
North Dakota $5,000 287,270 $143.64 $20.11
Ohio $0 4,557,655 $2,278.83 $319.04
Oklahoma $0 1,444,081 $722.04 $101.09
Oregon $500 1,516,456 $758.23 $106.15
Pennsylvania $500 4,958,427 $2,479.21 $347.09
Rhode Island $700 410,058 $205.03 $28.70
South Carolina $160 1,780,251 $890.13 $124.62
South Dakota $2,500 323,136 $161.57 $22.62
Tennessee $2,500 2,475,195 $1,237.60 $173.26
Texas $1,250 8,886,471 $4,443.24 $622.05
Utah $1,050 886,770 $443.39 $62.07
Vermont $0 257,004 $128.50 $17.99
Virginia $5,000 3,022,739 $1,511.37 $211.59
Washington $0 2,629,126 $1,314.56 $184.04
Washington DC $2 263,649 $131.82 $18.46
West Virginia $6,000 741,390 $370.70 $51.90
Wisconsin $12,500 2,288,332 $1,144.17 $160.18
Wyoming $0 222,846 $111.42 $15.60
joshsmith commented 8 years ago

The table above may not be the most accurate. You can view the latest at this Google spreadsheet.

joshsmith commented 8 years ago

If we use SmartyStreets for the majority of addresses (70-86%), then we can get reasonably accurate. We can use Google to fill in all the gaps.

I'm thinking for a given Address we might have at minimum columns like geocoded_at, geocoding_source, accuracy_level.

joshsmith commented 8 years ago

For addresses that are Zip9 accuracy in SmartyStreets, we may want to calculate the size of that Zip9's geographic area and assume a certain level of inaccuracy above some threshold, and slowly work to improve addresses as we go.

joshsmith commented 8 years ago

Over time, we can potentially collect data from canvassers on the ground and use that data from their mobile device to get a clearer picture of geocoded data. Our ultimate aim for address data on its own should be to provide the best to-the-door accuracy for a canvasser.