Replace Python internals with Golang HTTP service to perform webcrawling. This is done to take advantage of Golang's speed and built-in concurrency. The speed increase is significant.
Fix -i/--info information flag, the response returned from performing an HTTP requests with the requests library is an object so the text property needs to be used to get the HTML from the request.
Links have a new color used for status. yellow is now an indication that the link was redirected. (Maybe there should be a small indicator somewhere?)
Fix depth argument, previously it just didn't work correctly, especially when creating trees.
Changing version to 2.0.0 since this will be a major update and not backwards compatible (requiring a new server)
The golang service currently has to be started by the user in order to use the program. This can be resolved by either placing it on a public domain or creating an rpm/executable that can easily be used.
Metrics
Searching https://www.google.com at depth of 2. (results are from time command)
use_gotor:
dev:
Requires https://github.com/KingAkeem/gotor/pull/19
Changes Proposed
-i/--info
information flag, theresponse
returned from performing an HTTP requests with therequests
library is an object so thetext
property needs to be used to get the HTML from the request.yellow
is now an indication that the link was redirected. (Maybe there should be a small indicator somewhere?)depth
argument, previously it just didn't work correctly, especially when creating trees.Metrics
https://www.google.com
at depth of 2. (results are fromtime
command)use_gotor
:dev
:How to run
go run main.go -server
to start the goTor serviceExplanation of Changes
The main goal of this PR is to increase the performance of running webcrawling operations such as building trees
Tasks left