ciscocsirt / malspider

Malspider is a web spidering framework that detects characteristics of web compromises.
BSD 3-Clause "New" or "Revised" License
419 stars 78 forks source link

Can not connect to ghostdriver #18

Closed r3comp1le closed 7 years ago

r3comp1le commented 7 years ago

Is it failing because its trying HTTPS? Also notice when an IP is entered it adds www. to it

2016-11-30 22:00:40+0000 [scrapy] INFO: Scrapy 0.24.4 started (bot: full_domain)
2016-11-30 22:00:40+0000 [scrapy] INFO: Optional features available: ssl, http11, django
2016-11-30 22:00:40+0000 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'malspider.spiders', 'SPIDER_MODULES': ['malspider.spiders'], 'LOG_FILE': 'logs/malspider/full_domain/6bc999bcb74811e6b3e7129119453e14.log', 'USER_AGENT': 'Mozilla/5.0 (Android; Tablet; rv:30.0) Gecko/30.0 Firefox/30.0', 'BOT_NAME': 'full_domain'}
2016-11-30 22:00:40+0000 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2016-11-30 22:00:40+0000 [scrapy] INFO: Enabled downloader middlewares: RandomUserAgentMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-11-30 22:00:40+0000 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, WebdriverSpiderMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-11-30 22:00:40+0000 [scrapy] INFO: Enabled item pipelines: DuplicateFilterPipeline, WhitelistFilterPipeline, MySQLPipeline
2016-11-30 22:00:40+0000 [full_domain] INFO: Spider opened
2016-11-30 22:00:40+0000 [full_domain] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-11-30 22:00:40+0000 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-11-30 22:00:40+0000 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2016-11-30 22:00:40+0000 [scrapy] DEBUG: Downloading https://test.com with webdriver
2016-11-30 22:01:09+0000 [full_domain] ERROR: Error downloading <GET https://test.com>
    Traceback (most recent call last):
      File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
        self.__bootstrap_inner()
      File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
        self.run()
      File "/usr/lib/python2.7/threading.py", line 754, in run
        self.__target(*self.__args, **self.__kwargs)
    --- <exception caught here> ---
      File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191, in _worker
        result = context.call(ctx, function, *args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
        return self.currentContext().callWithContext(ctx, func, *args, **kw)
      File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
        return func(*args,**kw)
      File "build/bdist.linux-x86_64/egg/malspider/scrapy_webdriver/download.py", line 66, in _download_request

      File "build/bdist.linux-x86_64/egg/malspider/scrapy_webdriver/manager.py", line 75, in webdriver

      File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/webdriver.py", line 50, in __init__
        self.service.start()
      File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/service.py", line 81, in start
        raise WebDriverException("Can not connect to GhostDriver")
    selenium.common.exceptions.WebDriverException: Message: Can not connect to GhostDriver
jasheppa5 commented 7 years ago

Hmm... the "Can not connect to GhostDriver" error sounds like PhantomJS either isn't installed or isn't working properly. Can you go to the command line and type "phantomjs --version"? The dependencies script attempts to install 2.1.1, so you should expect to see version 2.1

If phantomjs can't be found or the version is not 2.1.1, you can update it. Sometimes distros come with an older phantomjs package and it doesn't update properly (this was seen by a few other users). You can try upgrading via apt-get, if that doesn't work use this code I pulled from the dependencies script to update phantomjs:

wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2 bunzip2 phantomjs.tar.bz2 tar xvf phantomjs.tar sudo cp phantomjs*/bin/phantomjs /usr/bin/phantomjs

Second, malspider auto-generates start urls. If you have a domain, say test.com, malspider builds these start urls: http://test.com http://www.test.com https://test.com https://www.test.com

This is to handle scenarios where a website is missing an A record for either www.test.com or test.com. Unfortunately, I did not test direct ip addresses, but I can add a quick check to ensure www is not added for direct ip addresses. The site should still be scanned, though, since http:// is one of the start urls - if it doesn't, then I may have an additional bug/error to fix.

-James

On Wed, Nov 30, 2016 at 5:40 PM, r3comp1le notifications@github.com wrote:

Is it failing because its trying HTTPS? Also notice when an IP is entered it adds www. to it

2016-11-30 22:00:40+0000 [scrapy] INFO: Scrapy 0.24.4 started (bot: full_domain) 2016-11-30 22:00:40+0000 [scrapy] INFO: Optional features available: ssl, http11, django 2016-11-30 22:00:40+0000 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'malspider.spiders', 'SPIDER_MODULES': ['malspider.spiders'], 'LOG_FILE': 'logs/malspider/full_domain/6bc999bcb74811e6b3e7129119453e14.log', 'USER_AGENT': 'Mozilla/5.0 (Android; Tablet; rv:30.0) Gecko/30.0 Firefox/30.0', 'BOT_NAME': 'full_domain'} 2016-11-30 22:00:40+0000 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState 2016-11-30 22:00:40+0000 [scrapy] INFO: Enabled downloader middlewares: RandomUserAgentMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 2016-11-30 22:00:40+0000 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, WebdriverSpiderMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2016-11-30 22:00:40+0000 [scrapy] INFO: Enabled item pipelines: DuplicateFilterPipeline, WhitelistFilterPipeline, MySQLPipeline 2016-11-30 22:00:40+0000 [full_domain] INFO: Spider opened 2016-11-30 22:00:40+0000 [full_domain] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2016-11-30 22:00:40+0000 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 2016-11-30 22:00:40+0000 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080 2016-11-30 22:00:40+0000 [scrapy] DEBUG: Downloading https://test.com with webdriver 2016-11-30 22:01:09+0000 [full_domain] ERROR: Error downloading <GET https://test.com> Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 774, in bootstrap self.bootstrap_inner() File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.target(*self.args, self.__kwargs) --- --- File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191, in _worker result = context.call(ctx, function, *args, *kwargs) File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, args, kw) File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext return func(*args,**kw) File "build/bdist.linux-x86_64/egg/malspider/scrapy_webdriver/download.py", line 66, in _download_request

File "build/bdist.linux-x86_64/egg/malspider/scrapy_webdriver/manager.py", line 75, in webdriver

File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/webdriver.py", line 50, in __init__
  self.service.start()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/phantomjs/service.py", line 81, in start
  raise WebDriverException("Can not connect to GhostDriver")

selenium.common.exceptions.WebDriverException: Message: Can not connect to GhostDriver

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ciscocsirt/malspider/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AR0QEAe8LHAmf5rmB9tiqgEkGTImSWVFks5rDft3gaJpZM4LAxpJ .

r3comp1le commented 7 years ago

Ah ok, I never got past the HTTPS part so thats why I didnt see the other iterations. Looks like something is screwy is phantomjs. Will attempt to reinstall it report back.

phantomjs --version

QXcbConnection: Could not connect to display
PhantomJS has crashed. Please read the bug reporting guide at
<http://phantomjs.org/bug-reporting.html> and file a bug report.
Aborted (core dumped)
r3comp1le commented 7 years ago

This worked

wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
bzip2 -d phantomjs-2.1.1-linux-x86_64.tar.bz2
tar -xvf phantomjs-2.1.1-linux-x86_64.tar
cp phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/bin/phantomjs
kumudraj commented 4 years ago

in case still not able to connect, run below command first. sudo apt-get install libfreetype6 libfreetype6-dev sudo apt-get install libfontconfig1 libfontconfig1-dev

and then install pjs.

cd ~ export PHANTOM_JS="phantomjs-2.1.1-linux-x86_64" wget https://bitbucket.org/ariya/phantomjs/downloads/$PHANTOM_JS.tar.bz2 sudo tar xvjf $PHANTOM_JS.tar.bz2

sudo mv $PHANTOM_JS /usr/local/share sudo ln -sf /usr/local/share/$PHANTOM_JS/bin/phantomjs /usr/local/bin