ail-project / ail-splash-manager

Deprecated: AIL crawler has been upgraded to https://github.com/ail-project/lacus
GNU General Public License v3.0
4 stars 2 forks source link

Crawler Error / Down #5

Closed ITSEC-DACHSER closed 1 year ago

ITSEC-DACHSER commented 2 years ago

Hi!

I Installed the AIL-Splash-Manager on the same machine as AIL itself is running (we only have this single machine) But i´m not able to get the Crawlers running because of an Error: image

Screen of the ail-splash-manager:

Launching all Splash dockers ...

 * Serving Flask app 'Flask_server'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on https://127.0.0.1:7001
 * Running on https://192.168.158.2:7001
Press CTRL+C to quit
127.0.0.1 - - [19/Aug/2022 14:28:33] "GET /api/v1/ping HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:40] "GET /api/v1/get/session_uuid HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:40] "GET /api/v1/ping HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:40] "GET /api/v1/get/proxies/all HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:41] "GET /api/v1/get/splash/all HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:41] "GET /api/v1/ping HTTP/1.1" 200 -
127.0.0.1 - - [19/Aug/2022 14:28:42] "GET /api/v1/ping HTTP/1.1" 200 -

Here is the output of the LAUNCH.sh -t


 ./LAUNCH.sh -t
 #### containers config: ####
# proxy_name: proxy name (defined in proxies_profiles.cfg)
# port: single port or port range (ex: 8050 or 8050-8052)
# cpu: max number of cpu allocated
# memory: RAM (G) allocated
# maxrss: max unbound in-memory cache (Mb, Restart Splash when full)
# description: docker description
[default_splash_tor]
proxy_name=default_tor
port=8050-8052
cpu=1
memory=1
maxrss=2000
description= default splash tor
net=bridge

# Splash with SQUID proxy
[web_splash]
proxy_name=web_proxy
port=8060
cpu=1
memory=1
maxrss=2000
description= web splash
net=bridge

# Splash with I2P proxy
#[default_splash_i2p] # section name: splash name
#proxy_name=default_i2p
#port=8053-8055
#cpu=1
#memory=1
#maxrss=2000
#description=default splash i2p
#net=host
#### proxies config: ####
# Tor: torrc default proxy
# use The torproject proxy https://2019.www.torproject.org/docs/debian
# (up to date, solve issues with v3 onion addresses)

# proxy name
[default_tor]
# proxy host
host=172.17.0.1
# proxy port
port=9050
# proxy type
type=SOCKS5
# proxy description
description=tor default proxy
# crawler type (tor or i2p or web)
crawler_type=tor

# SQUID proxy
[web_proxy]
host=172.17.0.1
port=3128
type=HTTP
description=web proxy
crawler_type=web

# I2P proxy
#[default_i2p]
#host=127.0.0.1
#port=4444
#type=HTTP
#description=i2p default proxy
#crawler_type=i2p

#### #### ####

 Launching Tests ...

Splash List:
b'6a30543d58f9   scrapinghub/splash   "python3 /app/bin/sp"   58 seconds ago   Up 57 seconds   0.0.0.0:8060->8050/tcp   gallant_moser\n719b9459bb82   scrapinghub/splash   "python3 /app/bin/sp"   58 seconds ago   Up 57 seconds   0.0.0.0:8052->8050/tcp   hardcore_nightingale\ne237c3d9a36d   scrapinghub/splash   "python3 /app/bin/sp"   59 seconds ago   Up 58 seconds   0.0.0.0:8051->8050/tcp   pensive_greider\nc8c36770f590   scrapinghub/splash   "python3 /app/bin/sp"   59 seconds ago   Up 58 seconds   0.0.0.0:8050->8050/tcp   strange_gould\n'

Testing Splash Docker 6a30543d58f9:
success

Testing Splash Docker 719b9459bb82:
success

Testing Splash Docker e237c3d9a36d:
success

Testing Splash Docker c8c36770f590:
success

Running docker container:


# docker container ls
CONTAINER ID   IMAGE                COMMAND                  CREATED         STATUS         PORTS                    NAMES
6a30543d58f9   scrapinghub/splash   "python3 /app/bin/sp…"   6 minutes ago   Up 6 minutes   0.0.0.0:8060->8050/tcp   gallant_moser
719b9459bb82   scrapinghub/splash   "python3 /app/bin/sp…"   6 minutes ago   Up 6 minutes   0.0.0.0:8052->8050/tcp   hardcore_nightingale
e237c3d9a36d   scrapinghub/splash   "python3 /app/bin/sp…"   6 minutes ago   Up 6 minutes   0.0.0.0:8051->8050/tcp   pensive_greider
c8c36770f590   scrapinghub/splash   "python3 /app/bin/sp…"   6 minutes ago   Up 6 minutes   0.0.0.0:8050->8050/tcp   strange_gould

Under onion crawler are both ports listed (8050 TOR + 8060 WEB) is this maybe the problem ? image

I checked the WebProxy Configuration (https://github.com/ail-project/ail-splash-manager#web-proxy) here in the project description but is not clear to me which "/etc/squid/squid.conf" i need to configure ? Squid is not installed per default with the ail-splash-manger install script on the host. Or do i need to change it inside the docker container ?

Is there anything i can debug further ?

Please give me a hint if more logs are needed. Thanks for your help!

Terrtia commented 1 year ago

Fixed in AIL v5.0 release: AIL crawler has been upgraded to Lacus