NCVotes / ncvoter

Moved from reesenewslab github. now just home for issues without a home. All the action's at https://github.com/NCVotes/ncvoter/issues
0 stars 0 forks source link

Match candidates to voters #9

Open rtburg opened 7 years ago

rtburg commented 7 years ago

This item can be done after #7 is closed but may be easier if #8 is also closed.

Join the candidates table with the ncvoter table by using names and addresses in order to obtain demographic and voter history behavior of candidates in the NC 2017 municipal elections.

Should determine confidence level for a "match" and flag records from the candidate field that fall below that threshold so that we can later attempt a manual match.

bill10 commented 7 years ago

Do we want to make a table for this or a view, as it will be subject to frequent change, I guess? It is also a simple SQL to do the match

select c.*, v.*
from 
    (select *
    from contest_county
    where election_date > '11/01/2017') as c, voter_ncvoter v
where
    dmetaphone(c.first_name)=dmetaphone(v.first_name) and 
        lower(c.last_name)=lower(v.last_name) and
        lower(c.address)=lower(v.res_street_address) and 
    lower(c.city)=lower(v.res_city_desc) and
        c.zip=v.zip_code;

This code does fuzzy match on the first name and exact match on everything else.

We can also use the levenshtein distance to compare strings and use the distance as a confidence level.

rtburg commented 7 years ago

Didn't mean to un-assign you, @bill10 ! Sorry! ... A view, I think...

rtburg commented 7 years ago

@bill10 When I run this query (and variations of it) I keep getting zero results, even on very specific examples I think should return one specific row. Can you take a look at it and post the results here?