NexvisionLab / Darkweb-search-engine

Dark Web & Deep Web Search Engine. Data Crawler and indexer for Darkweb , OSINT Tools for the Dark Web
GNU Affero General Public License v3.0
63 stars 12 forks source link
crwaler darknet darkweb deepweb osint osint-tool search-engine tor

Nexvision search engine

Features

Components

Elasticsearch

Elasticsearch cluster consists of 2 Elasticsearch instance for HA and load balancing. The scrapped page data is stored and searched.

Kibana

It runs on port 5601 and can be used to check the data in Elasticsearch

Web-General

The web interface for domain search engine. It runs on port 7000

MySQL

It stores the domains, page urls, bitcoin addresses, etc.

TOR Proxy

Used to access the onion pages. There are 10 proxy containers deployed and HAProxy is used to distribute the traffic.

Scraper

It gets the domain list from MySQL DB, harvest pages and new domains from onion domains through TOR proxies and stores the domains and page data in Elasticsearch and MySQL. Based on Python Scrapy framework.

Installation

Clone the project and build docker images involved in docker-compose.

docker-compose build
docker-compose up -d

Build and run the scraper.

docker build --tag scraper_crawler ./

Run the scraper.

docker run -d --name darkweb-search-engine-onion-crawler --network=darkweb-search-engine_default scraper_crawler /opt/torscraper/scripts/start_onion_scrapy.sh

After first deployment, need to initialize the indexes on Elasticsearch.

docker exec darkweb-search-engine-onion-crawler /opt/torscraper/scripts/elasticsearch_migrate.sh

Import initial domain list

docker exec darkweb-search-engine-onion-crawler /opt/torscraper/scripts/push_list.sh /opt/torscraper/onions_list/onions.txt &