Closed xme closed 1 year ago
hey @xme !
Do you have the same issue with regular website (crawled via the UI) ?
The default tor proxy provided by the package management system is not up to date. Using the tor proxy provided by The torproject (#344 ) may solve the problem.
You right, I should add an option to change the default splash timeout.
Only Onion websites apparently... How can I switch to the tor proxy provided by TorProject?
Follow these installation steps: https://2019.www.torproject.org/docs/debian.html.en#ubuntu (Option two: Tor on Ubuntu or Debian)
This should overwrite your /etc/tor/torrc
configuration file.
You need to edit this file as described:
/etc/tor/torrc
SOCKSPort 0.0.0.0:9050
or SOCKSPort 172.17.0.1:9050
SOCKSPolicy accept 172.17.0.0/16
in /etc/tor/torrc
(for a linux docker, the localhost IP is 172.17.0.1; Should be adapted for other platform)sudo service tor restart
(https://github.com/CIRCL/AIL-framework/blob/master/HOWTO.md#installationconfiguration)
@xme did you get a solution for this?
@annetteshajan I'm running the latest stable tor package (as suggested) but it did not improve. Most of the crawled Onion sites are down. I tested some of them via a Tor browser and it's also impossible to join them. I presume that they are indeed down. Sometimes, I get a peak of available sites... Strange...
@xme Are you sure your tor is installed correctly? When I run curl --socks5 localhost:9050 --socks5-hostname localhost:9050 -s https://check.torproject.org/ | cat | grep -m 1 Congratulations | xargs I do not get any output.. Ideally I should, however the tor service does say that it is running in my system
Another question, are you using a local or remote instance of the Splash server? My remote one does not seem to work. Every time I run sudo ./bin/torcrawler/launch_splash_crawler.sh -f configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1 it gives:
@xme Are you sure your tor is installed correctly? When I run curl --socks5 localhost:9050 --socks5-hostname localhost:9050 -s https://check.torproject.org/ | cat | grep -m 1 Congratulations | xargs I do not get any output.. Ideally I should, however the tor service does say that it is running in my system
# curl --socks5 localhost:9050 --socks5-hostname localhost:9050 -s https://check.torproject.org/ | cat | grep -m 1 Congratulations | xargs
Congratulations. This browser is configured to use Tor.
Another question, are you using a local or remote instance of the Splash server? My remote one does not seem to work. Every time I run sudo ./bin/torcrawler/launch_splash_crawler.sh -f configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1 it gives:
- A screen is already launched, please kill it before creating another one. I have even killed all the screens, also reinstalled it, it still gives same output
Default setup... In a docker, 3 instances
Example:
2019-05-27 12:09:57.796865 [events] {"args": {"uid": 140571194442752, "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "Accept-Language": "en"}, "render_all": 1, "har": 1, "wait": 10, "png": 1, "url": "http://winkledgargsurly.onion", "html": 1}, "timestamp": 1558958997, "active": 0, "user-agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "method": "POST", "path": "/render.json", "maxrss": 85020, "rendertime": 30.014784336090088, "client_ip": "172.17.0.1", "qsize": 0, "error": {"type": "GlobalTimeoutError", "error": 504, "description": "Timeout exceeded rendering page", "info": {"timeout": 30}}, "status_code": 504, "fds": 18, "_id": 140571194442752, "load": [4.21, 4.52, 4.81]}
What command is this? @xme
https://github.com/CIRCL/AIL-framework/issues/352#issuecomment-641064855 @annetteshajan It seems that you have a screen already running for the root user. Could you kill it before relaunching the mentioned script?
Nevermind. Seems to be solved in https://github.com/CIRCL/AIL-framework/issues/352#issuecomment-641064855
Fixed in AIL v5.0
Since I upgraded my AIL instance, I can't crawl any onion site. All requests return a "50"4" error. Is there a way to increase the timeout to reach the site via Tor or is it related to another issue?
Example:
2019-05-27 12:09:57.796865 [events] {"args": {"uid": 140571194442752, "headers": {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "Accept-Language": "en"}, "render_all": 1, "har": 1, "wait": 10, "png": 1, "url": "http://winkledgargsurly.onion", "html": 1}, "timestamp": 1558958997, "active": 0, "user-agent": "Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0", "method": "POST", "path": "/render.json", "maxrss": 85020, "rendertime": 30.014784336090088, "client_ip": "172.17.0.1", "qsize": 0, "error": {"type": "GlobalTimeoutError", "error": 504, "description": "Timeout exceeded rendering page", "info": {"timeout": 30}}, "status_code": 504, "fds": 18, "_id": 140571194442752, "load": [4.21, 4.52, 4.81]}