PhillyVoteByMail / phillyvotebymail

1 stars 0 forks source link

Process registered voter data #10

Open conorgil opened 3 years ago

conorgil commented 3 years ago

We purchased and downloaded the information for registered voters in Philadelphia. The data is uploaded to our google drive here (I gave you both access).

The download came with a doc that explains the file format, column headers, etc.

Acceptance Criteria:

I can imagine a few heuristics for who we should send postcards to, so I think our DB should include more than the name and mailing address of each voter. Few ideas:

  1. Choose randomly from voters who have participated in an election more recently than X date.
  2. Choose randomly from voters who have not participated in an election more recently than X date.
  3. Should we contact inactive voters, or only active voters?
  4. Should we avoid contacting voters who registered to vote very recently? Perhaps, since they just registered to vote recently there is a higher chance that they already know they can vote by mail compared to someone who registered to vote years ago and just has not heard the news that they can vote by mail this year?
  5. other? What are your ideas?
ravenac95 commented 3 years ago

So I think there are two aspects here that are important:

  1. What's the interface to this DB in code?
  2. How do we pseudo-randomly choose people and how do we store them in the DB?

I'm only going to address (1) because I think we that (2) should be encapsulated in (1).

Proposal for the DB so whoever writes the DB:

# pseudo go code coming up

# Voter Object
struct Voter {
  id String
  name String
  address String
}

# Retrieving random voters
# Returns an array of Voter objects
voters := voterDb.chooseRandomVoters()

# Get a single voter
voter := voters[0]

# Commit this list of voters so they're not chosen again
voter.confirmPostcardSentToVoter(voter)

Generally, in terms of how we store this. I think the easiest thing to do is simply process all the data and throw it into DynamoDB in some pre-sorted order. So we just scan from the beginning of the db and then move data between tables. However the interface above should make it flexible enough that we could use a more random selection set and change the underlying implementation. There's a small likelihood of double sending to some person. Though, we will combat that by ensuring only a single process of this worker is running at a given time. For now hacky but keeping user data safe is the quickest way to the end (our own costs are our own burden).