This repository contains an HTTP REST API and a command-line program designed for efficient data gathering and analysis through web crawling using the TOR network. While the program is primarily designed to work seamlessly with TorBot, the API and CLI can also operate independently.
-url
: URL to crawl. (Required Argument)-depth
: Depth of the tree. (default: 1)The program employs the TOR network for enhanced privacy and security during web crawling. TOR settings can be configured using environment variables or overridden using CLI flags.
-socks5-host
: Specify the SOCKS5 proxy host (default: localhost / 127.0.0.1)-socks5-port
: Specify the SOCKS5 proxy port (default: 9050)-disable-socks5
: Run the program without the SOCKS5 proxy. -server-host
: Specify the host that the server runs on (default: localhost / 127.0.0.1)-server-port
: Specify the port that the server runs on (default: 8081)-s
: Run the program as a service-d
: Download the results to an Excel spreadsheet (.xlsx)-f
: Output format for the results. Options are list or tree. (default: list)To start the HTTP server and initiate crawling, use the following command:
go run cmd/main/gotor.go -s
w/ alternate host and port for server and SOCKS5 proxy:
go run cmd/main/gotor.go -s -server-host 192.6.8.124 -server-port 8088 -socks5-host 127.0.0.1 -socks5-port 9051
To crawl directly using the CLI and output the results to an Excel file, use the following command:
go run cmd/main/gotor.go -url https://example.com -depth 2 -d
To run the server using Docker, a convenience script build.sh is provided. This script builds a Docker network service for Tor and connects it to the "gotor" Docker container. Make sure no other service is using the same port. The script uses the SOCKS5_PORT.
./scripts/build.sh
./scripts/destroy.sh
This project includes comprehensive code comments to facilitate documentation generation with godoc. To generate and access documentation, use the following command:
godoc -v -http=:6060
This will make the documentation available at http://127.0.0.1:6060.
This project is licensed under the GNU General Public License.
Feel free to contribute, report issues, or suggest improvements!