ShorensteinCenter / Benchmarks-Program

Free, open source data science metrics for MailChimp email lists, delivered via an email report
https://emailbenchmarking.com
MIT License
21 stars 6 forks source link
celery flask nodejs python ses

Shorenstein Center Email Benchmarks

This is a tool developed by the Shorenstein Center at the Harvard Kennedy School to import MailChimp email list data, analyze it, and output the resulting metrics in an email report.

Status

Branch Tests Code Coverage Comments
master CircleCI codecov Latest official release

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

Local Development

Create a new virtual environment
virtualenv venv
source venv/bin/activate
Install Python dependencies
pip install -r requirements.txt
Set environment variables

If NO_EMAIL is not set, Amazon SES is required along with the following variables:

The following variables are only required to run integration tests:

Upgrade the database
export FLASK_APP=app.py
flask db upgrade
Install Node dependencies
npm install

You may need to add the installed binaries to your system path (or install with the -g flag), as the application expects to find certain executables (such as orca).

Compile front-end
npm run gulp
Run the application
flask run
Run Celery
celery worker -A app.celery --loglevel=INFO

Finally, open a web browser and navigate to the SERVER_NAME URI.

Testing

Run unit and integration tests with pytest:

python -m pytest tests/unit
python -m pytest tests/integration

To generate a coverage report as well:

python -m pytest --cov=app --cov-report term-missing tests/unit

Linting

Lint the backend with pylint:

pylint app

Lint the frontend:

npm run lint

Python and Javascript rules are defined in pylintrc and .eslintrc, respectively.

Deployment

This app is environment-agnostic. We deployed it on Ubuntu using gunicorn and nginx, and daemonized Celery and Celery Beat. Here are a few pointers on what we did.

A sample init script for gunicorn:

[Unit]
Description=Gunicorn instance to serve app
After=network.target

[Service]
User=app_user
Group=www-data
WorkingDirectory=/path/to/app
Environment="PATH=/path/to/app/venv/bin"
ExecStart=/path/to/app/venv/bin/gunicorn --workers 5 --bind unix:email-benchmarks.sock -m 007 app:app

[Install]
WantedBy=multi-user.target

A sample init script for nginx:

server {
    listen 80;
    server_name SERVER_NAME;

    location / {
        include proxy_params;
        proxy_pass http://unix:/path/to/app/email-benchmarks.sock;
    }
}

Sample init scripts for Celery can be found in the Celery repo.

Setting up Orca (required for exporting visualizations from Plotly) can be tricky on headless machines. We got it to work by installing the standalone binaries and additional dependencies (such as google-chrome-stable) as per the readme, then using Xvfb with the -a flag, i.e. xvfb-run -a .... Additionally, restarting a daemonized Celery will create a new xvfb instance rather than re-using the one that is already running. We added the following function to our Celery init script, which kills running xvfb processes:

kill_xvfb () {
    local xvfb_pids=`ps aux | grep tmp/xvfb-run | grep -v grep | awk '{print $2}'`
    if [ "$xvfb_pids" != "" ]; then
        echo "Killing the following xvfb processes: $xvfb_pids"
        sudo kill $xvfb_pids
    else
        echo "No xvfb processes to kill"
    fi
}    

Authors

Acknowledgements

This project is generously supported by the Knight Foundation.

We use Browserstack to help ensure our projects work across platforms and devices.

License

This project is licensed under the MIT License - see the LICENSE file for details