Open g0pherzilla opened 4 years ago
Prehistory. In 2018, an enthusiast scanned Gopherspace with your tool and created a map. https://ibb.co/m8LWWr3 It's more extensive than you because there were multiple entry points.
Sorry for the late answer. grawler
uses a very naive and not very robust approach to crawling and it is possible to be caught in an endless loop. It's difficult to identify the issue without further debugging. And to be honest, I am not that interested in this project anymore, because of this:
I don't think grawler should be used in it's current state. It is not a well behaving crawler. It is not formalized, but gopher holes may restrict crawlers via a robots.txt
and grawler
is ignoring that. It would also be nice to prevent bursts of request to single gopher holes by delaying additional request in order to spread out the load.
Without this issues being fixed, I would say: Don't use this software.
If someone is interested in fixing this issues, I will happily transfer maintainership/ownership of this project.
Hello . Why if the entry point is zaledia.com, the crawler does not find all the links and gets stuck on zenalio.ch? Maybe it depends on the number of threads? That's how the crawler was launched:
The grawler.dot content after the scan is complete:
Would it be more efficient to modify the main.go file by adding reading from an array?
Example:
.....
If there are a lot of references?
# In 2 hours Crawling without parmeters took longer, but the content of the grawler.dot file remained unchanged.