CIRCL / AIL-framework

AIL framework - Analysis Information Leak framework. Project moved to https://github.com/ail-project
https://github.com/ail-project/ail-framework
GNU Affero General Public License v3.0
1.3k stars 282 forks source link

torcrawler.py timeout 504 #352

Closed xme closed 1 year ago

xme commented 5 years ago

Since I upgraded my AIL instance, I can't crawl any onion site. All requests return a "50"4" error. Is there a way to increase the timeout to reach the site via Tor or is it related to another issue?

Example:

2019-05-27 12:09:57.796865 [events] {"args": {"uid": 140571194442752, "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "Accept-Language": "en"}, "render_all": 1, "har": 1, "wait": 10, "png": 1, "url": "http://winkledgargsurly.onion", "html": 1}, "timestamp": 1558958997, "active": 0, "user-agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "method": "POST", "path": "/render.json", "maxrss": 85020, "rendertime": 30.014784336090088, "client_ip": "172.17.0.1", "qsize": 0, "error": {"type": "GlobalTimeoutError", "error": 504, "description": "Timeout exceeded rendering page", "info": {"timeout": 30}}, "status_code": 504, "fds": 18, "_id": 140571194442752, "load": [4.21, 4.52, 4.81]}

Terrtia commented 5 years ago

hey @xme !

Do you have the same issue with regular website (crawled via the UI) ?

The default tor proxy provided by the package management system is not up to date. Using the tor proxy provided by The torproject (#344 ) may solve the problem.

You right, I should add an option to change the default splash timeout.

xme commented 5 years ago

Only Onion websites apparently... How can I switch to the tor proxy provided by TorProject?

Terrtia commented 5 years ago

Follow these installation steps: https://2019.www.torproject.org/docs/debian.html.en#ubuntu (Option two: Tor on Ubuntu or Debian)

This should overwrite your /etc/tor/torrc configuration file. You need to edit this file as described:

(https://github.com/CIRCL/AIL-framework/blob/master/HOWTO.md#installationconfiguration)

annetteshajan commented 4 years ago

@xme did you get a solution for this?

xme commented 4 years ago

@annetteshajan I'm running the latest stable tor package (as suggested) but it did not improve. Most of the crawled Onion sites are down. I tested some of them via a Tor browser and it's also impossible to join them. I presume that they are indeed down. Sometimes, I get a peak of available sites... Strange...

annetteshajan commented 4 years ago

@xme Are you sure your tor is installed correctly? When I run curl --socks5 localhost:9050 --socks5-hostname localhost:9050 -s https://check.torproject.org/ | cat | grep -m 1 Congratulations | xargs I do not get any output.. Ideally I should, however the tor service does say that it is running in my system

annetteshajan commented 4 years ago

Another question, are you using a local or remote instance of the Splash server? My remote one does not seem to work. Every time I run sudo ./bin/torcrawler/launch_splash_crawler.sh -f configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1 it gives:

xme commented 4 years ago

@xme Are you sure your tor is installed correctly? When I run curl --socks5 localhost:9050 --socks5-hostname localhost:9050 -s https://check.torproject.org/ | cat | grep -m 1 Congratulations | xargs I do not get any output.. Ideally I should, however the tor service does say that it is running in my system

# curl --socks5 localhost:9050 --socks5-hostname localhost:9050 -s https://check.torproject.org/ | cat | grep -m 1 Congratulations | xargs
Congratulations. This browser is configured to use Tor.
xme commented 4 years ago

Another question, are you using a local or remote instance of the Splash server? My remote one does not seem to work. Every time I run sudo ./bin/torcrawler/launch_splash_crawler.sh -f configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1 it gives:

  • A screen is already launched, please kill it before creating another one. I have even killed all the screens, also reinstalled it, it still gives same output

Default setup... In a docker, 3 instances

annetteshajan commented 4 years ago

Example:

2019-05-27 12:09:57.796865 [events] {"args": {"uid": 140571194442752, "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "Accept-Language": "en"}, "render_all": 1, "har": 1, "wait": 10, "png": 1, "url": "http://winkledgargsurly.onion", "html": 1}, "timestamp": 1558958997, "active": 0, "user-agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "method": "POST", "path": "/render.json", "maxrss": 85020, "rendertime": 30.014784336090088, "client_ip": "172.17.0.1", "qsize": 0, "error": {"type": "GlobalTimeoutError", "error": 504, "description": "Timeout exceeded rendering page", "info": {"timeout": 30}}, "status_code": 504, "fds": 18, "_id": 140571194442752, "load": [4.21, 4.52, 4.81]}

What command is this? @xme

mokaddem commented 4 years ago

https://github.com/CIRCL/AIL-framework/issues/352#issuecomment-641064855 @annetteshajan It seems that you have a screen already running for the root user. Could you kill it before relaunching the mentioned script?

mokaddem commented 4 years ago

Nevermind. Seems to be solved in https://github.com/CIRCL/AIL-framework/issues/352#issuecomment-641064855

Terrtia commented 1 year ago

Fixed in AIL v5.0