A tool for digitizing election results data in the form of handwritten digits.
The instructions below should get you setup for a development environment. To get going in production, follow the instructions in DEPLOYMENT.md.
Install OS level dependencies:
Clone this repo & install app requirements
We recommend using virtualenv and virtualenvwrapper for working in a virtualized development environment. Read how to set up virtualenv.
Once you have virtualenvwrapper set up,
mkvirtualenv et
git clone git@github.com:datamade/election-transcriber.git
cd election-transcriber
pip install -r requirements.txt
Create a PostgreSQL database for election transcriber If you aren't already running PostgreSQL, we recommend installing version 9.6 or later.
createdb election_transcriber
Create your own app_config.py
file
cp transcriber/app_config.py.example transcriber/app_config.py
You will need to change, at minimum:
DB_USER
and DB_PW
to reflect your PostgreSQL username/password (by default, the username is your computer name & the password is '')S3_BUCKET
to tell the application where to look for your cache of images
to transcribeAWS_CREDENTIALS_PATH
tells the application where to find the CSV file
with your AWS credentials in it. By default, the application looks for
a file called credenitals.csv
in the root folder of the project.You can also change the username, email and password for the initial user roles, defined by ADMIN_USER
, MANAGER_USER
, and CLERK_USER
Create your own alembic.ini
file
cp alembic.ini.example alembic.ini
You will need to change, at minimum, user
& pass
(to reflect your PostgreSQL username/password) on line 6
Initialize the database
alembic upgrade head
Import images
python update_images.py
Run the app
python runserver.py
In another terminal, run the worker
python run_queue.py
Once the server is running, navigate to http://localhost:5000/
There is a script in the root folder of the project called
syncDriveFolder.py
. As you might guess, it's the script that is responsible
for syncing files from a Google Drive folder to an AWS S3 bucket.
Setup Google Service Account
{
"type": "service_account",
"project_id": "[name of the project]",
"private_key_id": "[long hash]",
"private_key": "[very very long hash]",
"client_email": "some-user@project-name.iam.gserviceaccount.com",
"client_id": "[long number]",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "[long URL]"
}
As was explained in the part where you download that, the contents of this file should be kept secret.
client_email
address from that JSON file.Setup AWS User
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1508430268000",
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::[bucket_name]/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::[bucket_name]"
]
}
]
}
To run the syncDriveFolder.py
script, just put the credentials file from
Google and the credentials file from AWS in the root folder of the project run
the script like
python syncDriveFolder.py -f [name_of_drive_folder] -n [name_of_election]
A full list of options for that script can be seen by running python syncDriveFolder.py --help
.
usage: syncDriveFolder.py [-h] [--aws-creds AWS_CREDS]
[--google-creds GOOGLE_CREDS] -n ELECTION_NAME -f
DRIVE_FOLDER [--capture-hierarchy]
Sync and convert images from a Google Drive Folder to an S3 Bucket
optional arguments:
-h, --help show this help message and exit
--aws-creds AWS_CREDS
Path to AWS credentials. (default:
/home/eric/code/election-transcriber/credentials.csv)
--google-creds GOOGLE_CREDS
Path to Google credentials. (default:
/home/eric/code/election-transcriber/credentials.json)
-n ELECTION_NAME, --election-name ELECTION_NAME
Short name to be used under the hood for the election
(default: None)
-f DRIVE_FOLDER, --drive-folder DRIVE_FOLDER
Name of the Google Drive folder to sync (default:
None)
--capture-hierarchy Capture a geographical hierarchy from the name of the
file. (default: False)