cjpatton / seaice

Code for the online YAMZ metadictionary (formerly seaice).
http://yamz.net
Other
1 stars 1 forks source link

This is the README for the YAMZ metadictionary and includes instructions for deploying on a local machine for testing and on Heroku (heroku.com) for a scalable production version. These assume a Ubuntu GNU/Linux environment, but should be easily adaptable to any system; YAMZ is written in Python and uses only cross-platform packages.

Authored by Chris Patton. Last updated 28 Sep 2014.

YAMZ is formerly known as SeaIce; for this reason, the database tables and API are named for SeaIce.

Contents

  1. Prerequites 0.1 Repository structure

  2. Configuring a local instance 1.1 Postgres authentication 1.2 Create the database 1.3 Create a role for standard queries 1.4 Oauth credentials and app key 1.5 N2T persistent identifier credentials 1.6 Test the instance

  3. Deploying to Heroku 2.1 Heroku-Postgres 2.2 Mailgun 2.3 Heroku-Scheduler 2.4 Making changes 2.5 Exporting the dictionary

  4. URL forwarding

  5. Building the docs

  1. Prerequisites

The contents of this directory are as follows:

sea.py . . . . . . . . . . Consolue utility for scoring and classifying terms and other things.

ice.py . . . . . . . . . . Web server front end.

digest.py . . . . . . . . Console email notification utility.

requirements.txt . . . . . Heroku package dependencies.

Procfile . . . . . . . . . Heroku configuration.

seaice/ . . . . . . . . . The SeaIce Python module.

html/ . . . . . . . . . . HTML templates, static Javascript and CSS, including bootstrap.js.

doc/ . . . . . . . . . . . API documentation and tools for building it.

.seaice/.seaic_auth . . . DB credentials, API keys, app key, etc. Note that these files are just templates and don't contain actual keys.

Before you get started, you need to set up a database and some software packages. On Ubuntu, grab the follwoing:

python-flask . . . . . . . Simple HTTP server.

postgresql . . . . . . . . We're using PostgreSQL for databse managment.

python-psycopg2 . . . . . Python API for PostgreSQL.

python-pip . . . . . . . . Package manager for additional Python programs.

We need to download a package from pip that handles configuration files nicely. Do:

$ sudo pip install configparser flask-login flask-Oauth python-dateutil urlparse

0.1 Repository structure

The 'master' branch contains all the code to deploy locally or on heroku. This directory. To deploy, create a local branch called "deploy_keys" and edit .seaice .seaice_auth with actual API and app keys. Then push to heroku with git push heroku deploY_keys:master. See section 2 for more on heroku. NEVER PUSH THIS BRANCH TO GITHUB.

  1. Configuring a local instance

To start, we'll set up a database in postgres. First, we need to do some configuration. Postgres requires an administrative user called 'postgres'. It may be a good idea to create a SeaIce user (called "role" in postgres jargin) with read/ write access granted on the tables. First, set postgres' password:

$ sudo -u postgres psql template1 template1=# alter uesr postgres with encrypted password 'PASS';
template1=# \q [quit]

1.1 Posgres authentication

Now configure the authentication method for postgres and all other users connecting locally. In /etc/postgresql/9.1/main/pg_hba.conf, change "peer" in line

local all postgres peer

to "md5" for the administrative account and local unix domain socket connections. Next, we want to only be able to connect to the database from the local machine. In /etc/postgresql/9.1/main/postgresql.conf, uncomment the line

listen_addresses = 'localhost'

After you've done this, you need to restart the postgres server:

$ sudo service postgresql restart

1.2 Create the database

Finally, log back into postgres to create the database:

$ sudo -u postgres psql postgres=# create database seaice with owner postgres;

(Using unique, completely random passwords is a good idea here.) Next, create a configuration file for the database and user account you set up. Create a file called .seaice like:

[default] dbname = seaice user = postgres password = PASS

IMPORTANT NOTE: A template of this file is provided in the github repository. This file should remain secret and must not be published. Set the correct file permissions with:

$ chmod 600 .seaice

This file is used by the SeaIce DB connector to grant access to the database. To initialize the DB schema and tables, type:

$ ./sea.py --init-db --config=.seaice

1.3 Create a role for standard queries

At this point, it's suggested that you set up a user standard read/write permssions on the table (no DROP, CREATE, GRANT, etc.) for most of the database queries. Note that this isn't applicable in Heroku; the postgres interface there doesn't allow you to control user views.

postgres=# create user contributor with encrypted password 'PASS'; postgres=# \c seaice; postgres=# grant select, insert, update, delete on all tables in schema SI, SI_Notify to contributor;

Add the configuuration to .seaice:

[contributor] dbname = seaice user = contributor password = PASS

The web user interface creates a database connection pool with the same role. You can specify this on the command line:

$ ./ice.py --role=contributor --config=.seaice

'--role' defaults to 'default'.

1.4 Oauth credentials and app key

YAMZ uses Google for third party authentication (OAuth-2.0) management of logins. Visit https://console.developers.google.com to set this service up for your instance. Navigate to "APIs & auth" -> "Credentials" and click "Create new client ID". For local configuration:

Application type . . . . . . . . . . . . Web application Authorized javascript origins . . . . . http://localhost:5000 Authorized redirect URI . . . . . . . . http://localhost:5000/authorized

You'll create a new client ID for your heroku instance. (See section 2.) Next, create a configuration file called .seaice_auth with the appropriate client ID's and secret keys. For instance, you may have credentials for 'http://localhost:5000', as well as a dpeloyment on heroku:

[dev] google_client_id = 000-fella.apps.googleusercontent.com google_client_secret = SECRET1 app_secret = SECRET2

[heroku] google_client_id = 000-guy.apps.googleusercontent.com google_client_secret = SECRET3 app_secret = SECRET4

IMPORTANT NOTE: A template of this file is provided in the github repository. This file should remain secret and must not be published. We provide the template, since heroku requires a commited file.

For conveniance, this file will also keep the Flask app's secret key. For this key, enter a long, random string of characters. Finally, set the correct file permissions with:

$ chmod 600 .seaice_auth

1.5 N2T persistent identifier credentials =========================================

A third-party URI resolver (maintianed by John Kunze et al.) is now an integrated part of YAMZ. It is therefore necessary to provide a minter password for API access to this web service. Include a line in .seaice_auth for every view:

minter_password = PASS

1.6 Test the instance

First, create the database schema:

$ ./sea --config=.seaice --init-db

Start the local server with:

$ ./ice.py --config=.seaice --deploy=dev

If all goes well, you should be able to navigate to your server by typing 'http://localhost:5000' in the address bar. To verify that you've set up Google Oauth-2.0 correctly, try logging in. This will create an account. Try adding a new term, modifying and deleting a term, and commenting on termss. To classify a term, do:

$ ./sea.py --config=.seaice --classify-terms

  1. Deploying to Heroku

The YAMZ prototype is currently hosted at http://yamz.herokuapp.com. Heroku is a cloud-computing service which allows users to host web-based software projects. Heroku is scalable for a price; however, we can still achieve quite a bit without spending money. We have access to a small Postgres database, can shedule jobs, use a variety of packages (all we need are available), and deploy easily with Git. It is however impossible to set up DB roles. Also, we can't assume persistent access to the filesystem.

To begin, you need to setup an account with Heroku and download their software. (It's nothing major, just some tools for running commands, interacting with the database, etc.) Visit http://www.heroku.com.

Heroku requires a couple additional configuration files and some small code changes. The additional files are:

Procfile . . . . . . . specifies the commands that start web server, as well as periodic jobs.

requirements.txt . . . a list of packages required by our software that Heroku needs to make available. These are available via pip.

I used the following tutorial: https://devcenter.heroku.com/articles/python to set these up. Note that these steps have already been done; once you've set up your heroku account, you're ready to deploy.

The recommended best practice for managing your heroku instance is to set up a local branch called 'deploy_keys' based on 'master'. In this branch, edit .seaice_auth to contain actual API and app keys. NOTE: IT IS CRITICAL THAT YOU DON'T PUSH THIS BRANCH TO GITHUB. Publishing these secrets comprimises the security of the entire app.

Login via the heroku website and create a new app. (Suppose we've named it "fella".) Navigate to the directory containing the cloned repository. Create and checkout the branch 'deploy_keys'.

$ heroku login $ heroku git:remote -a fella $ git push heroku deploy_keys:master

This creates a "slug" containing our code and its dependencies. To get the web app running, we'll now need to set up a database and a couple heroku backend services.

2.1 Heroku-Postgres

Heroku-Postgres is a scalable DB interface for heroku apps. Follow the following tutorial to set this up: https://devcenter.heroku.com/articles/heroku-postgresql#connection-in-python

The 'master' branch is set up to use either a local postgres database server or Heroku-Postgres. The location of the DB in the "cloud" is specified by the environment variable "DATABASE_URL". Using 'sea' or 'ice' with '--config=heroku' will force SeaIce to use this variable to connect to the DB. (Note this is the default.) Heroku-Postgres doesn't allow you to create roles, so '--role' will be ignored and the default will be used. To create the database schema:

$ heroku run python sea.py --init-db

2.2 Mailgun

YAMZ provides an email notification service for users who opt in. A utility called 'digest' collects for each user all notifications that haven't previously been emailed into a single digest. The code uses a heroku backend app called Mailgun for SMTP service. To set this up, simply type

$ heroku addons:add mailgun

The code uses environment variables "MAILGUN_SMTP_LOGIN" and "MAILGUN_SMTP_PASSWORD" to connect to Mailgun. To send out notifications, type:

$ heroku run python digest.py

2.3 Heroku-Scheduler

There are two periodic jobs that need to be scheduled in YAMZ: the term classifier and the email digest. To set this up, do:

$ heroku addons:add scheduler $ heroku addons:open scheduler

The second command will take you to the web interface for the scheduler. Add the following two jobs:

"python sea.py --classify-terms" . . . . . every 10 minutes "python digest.py" . . . . . . . . . . . . once per day

2.4 Making changes

Deploying changes to heroku is made very easy with Git. Suppose we have changes to 'master' that we want to push to heroku. First checkout the already created local 'deploy_keys' branch:

$ git checkout deploy_keys $ git merge master $ git push heroku deploy_keys:master

2.5 Exporting the dictionary

The SeaIce API includes queries for importing and exporting database tables in JSON formatted objects. This could be used to backup the entire database. Note however that imports must be done in the proper order in order to satisfy foreign key constratins. To back up the dictionary, do:

$ heroku config | grep DATABASE_URL DATABASE_URL: $ export DATABASE_URL= $ ./sea.py --config=heroku --export=Terms >terms.json

  1. URL forwarding

The current stable implementation of YAMZ is redirected from http://yamz.net. Setting this up takes a bit of doing. The following instructions are synthsized from http://lifesforlearning.com/heroku-with-godaddy/ for redirecting a domain name managed by GoDaddy to a Heroku app.

Launch the "Domains" app on GoDaddy. Under "Forward Domain" for the appropriate domain (let's call it "fella.org"), add the following settings:

Forward to . . . . . . . . . . . . . . . . . . . . http://www.fella.org Redirect type . . . . . . . . . . . . . . . . . . 301 (Permanent) Forward settings . . . . . . . . . . . . . . . . . Forward only Update nameservers and DNS settings to support this change . . . . . . . . . yes

Next, under "Manage DNS", remove all entries except for 'A (Host)' and 'NS (Nameserver)', and add the following under 'CName (Alias)':

Record type . . . . . . . . . . . . . . . . . CNAME (Alias) Host . . . . . . . . . . . . . . . . . . . . . www Points to . . . . . . . . . . . . . . . . . . http://fella.herokuapp.com TTL . . . . . . . . . . . . . . . . . . . . . 1 Hour

Next, change the IP address for entry '@' under 'A (Host)' to 50.63.202.31.

That's it for DNS configuration. The last thing we need to do is modify the redirect URLS in the Google Oauth API. Edit the authorized javascript origins and redirect URI by replacing "fella.herokuapp.com" with "fella.org" and save.

It can take a couple hours to a day for your DNS settings to propogate. Once it's done, you can navigate to YAMZ by typing "fella.org" into your browser. Try logging in to verify that the Oauth settings are also correct.

  1. Building the docs

The seaice package is autodoc'ed using python-sphinx. To install on Ubuntu:

$ sudo apt-get install python-sphinx

The directory doc/sphinx includes a Makefile for exporting the docs to various media. For example,

make html make latex