Bypass-OTF_Proxy

Version 0.2

Overview

This repository contains 3 applications:

automation.py: An application to set up, maintain, and report on mirrors, onions, and ipfs nodes.
log_stats.py: An application to report on logfiles generated by EOTK (the onion proxy), Nginx, Cloudfront,Fastly and Azure log files.
A Flask application which serves as a reporting API for the Bypass Censorship Extension, as well as a front-end for viewing:
- Reports from that API
- Current mirrors/onions and their status
- Log file reporting

System

All of this has been tested on Ubuntu 18.04 LTS. It should work on any Ubuntu/Debian based system. It has not been tested on Mac OS or Windows.

Prerequisites

You need (* are required):

a server to host this*
At least one account to create CDN distributions*:
- an AWS account that has permission to create/read/write Cloudfront Distributions
- a Fastly account that has permission to create new configurations
- an Azure account with permissions to create new CDN distributions
An AWS account and permission to read/write S3 buckets
a Github repo for mirrors in JSON format that is read by the Bypass Censorship Extension browser extension. An example is here*

If you want to add onions, the best method is using Alec Muffett's EOTK (Enterprise Onion ToolKit). One way to mine vanity .onion addresses is to use eschalot. At this time, onion addition is not automated.

Setup

git clone https://github.com/OpenTechFund/bypass-otf_proxy
cd bypass-otf_proxy
pipenv install
pipenv shell
cd bcapp/flaskapp
git clone git@github.com:fastly/fastly-py.git

You can use any other python environment manager you choose, use the requirements file instead of the Pipfile.

Setting up the Database

In order to report on domains using this command line app, you'll need to make sure the database is set up. You can use Sqllite, Postgresql or MySql. Add the database URL in the .env file (see .env file creation docs in Flask app documentation)

Once the database is set up, and accessible, and you are in the virtual environment:

cd bcapp/flaskapp/app
flask db init
flask db migrate
flask db upgrade

Mirror Application

The use case for this application is that there are websites which has been censored by some state actor that you want people to have access to. This will allow you to set up and maintain proxy "mirrors' using CDNs (Content Display Networks), as well as real mirror URLs (manually), onion addresses (manually), and IPFS nodes (manually).

Usage

Usage: automation.py [OPTIONS]

Options:
  --testing                       Domain testing of all available mirrors and
                                  onions
  --domain TEXT                   Domain to act on
  --test                          Test when listing domain
  --proxy TEXT                    Proxy server to use for testing/domain
                                  detail.
  --existing TEXT                 Mirror exists already, just add to github.
  --replace TEXT                  Mirror/onion to replace.
  --delete                        Delete a domain from list
  --log [enable|disable]          Enable or Disable Logging
  --s3 TEXT                       Add this s3 log storage bucket
  --remove TEXT                   Mirror or onion to remove
  --domain_list                   List all domains and mirrors/onions
  --mirror_type [cloudfront|azure|fastly|onion|mirror|ipfs]
                                  Type of mirror
  --report                        Get report from api database
  --generate_report               Generate report and possibly send email to
                                  admins, etc.
  --mode [daemon|web|console]     Mode: daemon, web, console
  --ooni INTEGER                  OONI Probe Data set range
  --missing [cloudfront|azure|fastly|onion|mirror|ipfs|domain]
                                  Get missing for alternative type or domain -
                                  use 'domain' or 'cloudfront', '
  --help                          Show this message and exit.

Listing

To get a list of all domains and mirrors use: python automation.py --domain_list

To get a list of one domain and it's mirrors use: python automation.py --domain=domain.com

To test that domain in addition, use: python automation.py --domain=domain.com --test

(Note: This also works with URLs. If the URL has a '&', use quotes in this request - e.g. --domain='http://www.youtube.com/watch?v=xxxxxxxxxxxx&list=WL')

Testing:

python automation.py --testing

This goes through the list of all domains, testing each domain, mirror and onion (ipfs testing forthcoming), and adding to the database.

If you use python automation.py --testing --mode=daemon via cron, that will test all sites, and make reports to the database of the status of all sites.

Domain addition:

To add an existing mirror (one that you have already set up, including onions) use:

python automation.py --domain=domain.com --existing=domain_mirror.com

This will add a mirror (or onion, if it is a .onion) to the json file. If the domain doesn't exist in the json file, it will add the domain.

To add a new mirror automatically for Cloudfront, Fastly, or Azure use:

python automation.py --domain=domain.com --mirror_type=cloudfront|fastly|azure|onion|ipfs

(The cloudfront, fastly, and azure processes are automated. The onion and ipsf processes are not yet.)

If you want a cloudfront distro, it will create that for you, and tell you the domain. For Fastly and Azure, you'll have to specify the Fastly and Azure subdomain (Cloudfront specifies a subdomain for you, Fastly and Azure require you to define it.)

All configurations are in auto.cfg (see auto.cfg-example)

Mirror replacement

To replace one mirror with another use:

python automation.py --domain=domain.com --replace=oldmirror.com --existing=newmirror.com

or (implemented for cloudfront so far)

python automation.py --domain=domain.com --replace=oldmirror.com --mirror_type=cloudfront|fastly|azure|ipfs

If the mirror_type is defined, the replacement will be automated, and whatever is needed to reset the mirror url will be done.

Domain Deletion

To delete an entire domain and it's mirrors/onions, use:

python automation.py --domain=domain.com --delete

Add S3 bucket

To add an S3 bucket as a place to hold logs for a domain use:

python automation.py --domain=domain.com --s3 NAME_OF_BUCKET

Add logging

Automatically add logging for a domain. This only works to add logging for cloudfront domains. An S3 bucket for the domain has to have been set up previously.

python automation.py --domain=domain.com --add_logging

Reporting

To get a raw report of all of the testing reports for that domain:

python automation.py --domain=domain.com --report

To generate an email report summarizing all negative reports for all domains:

python automation.py --generate_report

Mode

Daemon mode is for things like cron jobs - it suppresses output.

Finding missing services

If you want to know which domains don't have a particular service proxy, use:

python automation.py --missing=cloudfront (It is most useful for cloudfront, as you can create unlimited cloudfront proxies.)

Notes

There are some defaults for all four systems, and if you want to change those, you would need to go to the documentation for each and modify the code:

Problems you might encounter:

IP address of proxy source (Cloudfront, Fastly, etc.) blocked by origin website by policy
Assets (css, video etc.) not completely proxied properly, leading to bad formatting or missing content
Absolute rather than relative links/resource urls leading to the original site
Other proxy difficulties that are hard to diagnose (for example, some websites proxy fine with one service but not another.)

Ongoing reporting

To do ongoing reporting on domain and alternative status, set up a cron job with the following format:

30 08 * * * cd /path/to/bypass-otf_proxy/bcapp/flaskapp/; ~/path/to/venv/bin/python automation.py --testing --mode=daemon

This would run a test of all domains and alternatives each day at 8:30am.

To send a daily report by email (email setup in auto.cfg) use this:

01 04 * * * cd /path/to/bypass-otf_proxy/bcapp/flaskapp/; ~/path/to/venv/bin/python automation.py --generate_report --mode=daemon

OONI Reporting

This is designed for cron use in a regular fashion. To probe all OONI data from domains in the repository:

03 09 * * * cd /path/to/bypass-otf_proxy/bcapp/flaskapp/; ~/path/to/venv/bin/ automation.py --ooni=7 (last 7 dats of OONI data will be added to the database.)

To Use IPFS (Plus YouTube Dowloader)

Install IPFS

Follow these instructions from IPFS.

Initialize the repository: ipfs init --profile server. Copy the IPFS peer identity and place it in the auto.cfg file under [SYSTEM] (You can leave out the --profile server option if you are not running this in a datacenter.)

Add ipfs as a service to your server. Create /etc/systemd/system/ipfs.service:

[Unit]
Description=IPFS daemon
After=network.target
[Service]
User=ubuntu
ExecStart=/usr/local/bin/ipfs daemon
[Install]
WantedBy=multiuser.target

Start the service:

sudo service ipfs start

Install YouTube Downloader

sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl

Make sure that python is correctly associated with python3 in update-alternatives

Add the following configuration file to the home directory of the user who is running this app (such as 'ubuntu'):

# Lines starting with # are comments

# Always extract audio
#-x

# Do not copy the mtime
#--no-mtime

# Save all videos under /var/www/ipfs/
-o /var/www/ipfs/%(id)s.%(ext)s

Configuration options can be found in the youtube-dl repository.

Log Reporting Analysis Application

This part of the application requires moving logs from wherever they are generated to S3.

Moving local logs to S3

The log reporting app only analyzes log files from S3 (or Azure Storage, forthcoming). If you have local files (onion, mirror, etc.) to analyze, you must move them to S3. move_logs.py does that for you.

Usage: move_logs.py [OPTIONS]

  Move logs from local to s3

Options:
  --daemon         Run in daemon mode. All output goes to a file.
  --zip            Save zipped log files
  --recursive      Descent through directories
  --range INTEGER  Days of log file age to save. Default is 7
  --help           Show this message and exit.

A periodic cron job like this will do the trick:

15 12 * * 1 cd /path/to/bypass-otf_proxy/bcapp/flaskapp/; ~/path/to/venv/bin/python move_logs.py --daemon --range=10 --recursive --zip

This will move all files to S3 from the last week on Sunday at 12:15am.

Local log file configurations are in auto.cfg. auto.cfg points to a paths file, with the format:

domain1|/path/to/domain1/logfiles
domain2|path/to/domain2/logfiles

Remote Log Analysis

For local logs moved to S3, Cloudfront logs, and Fastly logs, the analysis script runs through the domain list, and if there is an S3 bucket set in the database, it will go through all of the files, and compile them by type. There are three types: 'nginx', 'cloudfront', and 'fastly'. Reporting is generated by log type.

Generating Cloudfront Logs

In order to get Cloudfront logs to S3, you need to configure Cloudfront to do that. Once you do that, put the S3 bucket where those logs are stored into the configuration tools.

Streaming Fastly logs to S3

If you follow these instructions it should be fairly straightforward. However, the fastly logs are stored under individual services, and if you have multiple domains in one service, those logs will be aggregated in one S3 bucket. You must use the following log format: %v %h %t %m "%r" %>s (domain, remote IP address, time of request, method, "line of request", and final status. See this page for more details on log format.)

Generating AzureEdge Logs

Following these instructions you can stream logs (at the profile level) to an Azure storage account. One note - make sure to register your subscription (Azure Portal > Subscriptions > Choose yours > Resource Providers) with microsoft.insight.

Then go to the Azure portal, and find the access keys for that storage account, and add them to auto.cfg, in the appropriate settings space under the AZURE section.

Make sure to add the Azure profile name to the domain. At the moment, that can be done only from the web interface.

Analyzing Logs

Usage: python log_stats.py [OPTIONS]

Options:
  --percent INTEGER    Floor percentage to display for agents and codes
                       (default is 5%)
  --num INTEGER        Top number of pages to display (default is 10)
  --unzip              Unzip and analyze zipped log files (bz2 files only)
  --daemon             Run in daemon mode. Suppresses all output.
  --range              Days of log file age to analyze. Default is 7
  --domain             Domain to analyze. Default is 'all'
  --help               Show this message and exit.

If you choose a single domain, this application will go to the database, and look to determine if there is an S3 bucket specified, grab all files within 'range' and analyze in bulk. If you don't have a bucket specified, it will skip the domain. If you don't specify a domain, it will check all domains in the database, and analyze any files found in specified S3 buckets.

(TBD): If there is an azure storage place specified, this will first move logs from Azure cloud to S3

Reports on this analysis are stored in the S3 log buckets, as well as the database. If you want periodic analysis (like once a week), a cron job like this:

30 12 * * 1 cd /path/to/bypass-otf_proxy/bcapp/flaskapp/; ~/path/to/venv/bin/python log_stats.py --unzip

This would go through all domains and report on the ones with log files in S3 buckets and store in the database for reporting on the application front end (see docs on Flask application.)

Using your own external analytics program

If you have an external analytics platform, such as google analytics, and it uses a javascript snippet to track visits, you should be able to track visits using the proxies and onions as well. The challenge is exposing the URL of the actual page visited. Most analytics packages only display the path of the page, not the domain - but they should have the data of the domain - it just needs to be exposed. For example, in Google Analytics, you can use filters to filter data from a particular hostname (such as your .onion, or your proxy/mirror).

Flask Application

See documentation here

OpenTechFund / bypass-otf_proxy

readme