ail-project / ail-splash-manager

Deprecated: AIL crawler has been upgraded to https://github.com/ail-project/lacus
GNU General Public License v3.0
4 stars 3 forks source link

Deprecated: AIL v5.0 crawler has been upgraded to Lacus

https://github.com/ail-project/lacus

AIL no longer relies on any Docker image.

ail-splash-manager

AIL crawlers are using a splash crawler to fetch and render a domain.
The purpose of this Flask server is to simplify the installation and manage them:

Installation

git clone https://github.com/ail-project/ail-splash-manager.git
cd ail-splash-manager
./install.sh

Usage

Launching AIL Splash Manager

./LAUNCH.sh -l

killing AIL Splash Manager and all Splash dockers

./LAUNCH.sh -k

Launching AIL Splash Manager Tests

./LAUNCH.sh -t

Tor proxy

Installation

The tor proxy from the Ubuntu package is installed by default.

This package is outdated: Some v3 onion address are not resolved.

/!\ Install the tor proxy provided by The torproject to solve this issue./!\

Note: Ubuntu Install, add torrc in apt sources:

sudo sh -c 'echo "deb https://deb.torproject.org/torproject.org $(lsb_release -sc) main" >> /etc/apt/sources.list.d/tor-project.list'

Once installed, we need to allow all splash dockers to reach this proxy. You can use the configure_tor script or configure it yourself.

Configuration

AIL framework crawlers configuration :
Proxies:

Edit config/proxies_profiles.cfg:

[default_tor] # section name: proxy name
host=172.17.0.1
port=9050
type=SOCKS5
description=tor default proxy
crawler_type=tor
Splash Dockers:

Edit config/containers.cfg:

[default_splash_tor] # section name: splash name
proxy_name=default_tor
port=8050-8052
cpu=1
memory=1
maxrss=2000
description= default splash tor
net=bridge

I2P

Installation:

Go on i2p website and follow the installation instruction

Configuration

[default_splash_i2p] # section name: splash name
proxy_name=default_i2p
port=8053-8055
cpu=1
memory=1
maxrss=2000
description=default splash i2p
net=host
[default_i2p]
host=127.0.0.1
port=4444
type=HTTP
description=i2p default proxy
crawler_type=i2p

Web proxy

SQUID

API

api/v1/ping

api/v1/version

api/v1/get/session_uuid

api/v1/get/proxies/all

api/v1/get/splash/all

api/v1/splash/restart

api/v1/splash/kill