Closed 0xEnders closed 6 months ago
What's the link so that I can try to reproduce it? Also can you provide more information such as
Thanks for the quick reply!
I am trying the links :
http://alphvmmm27o3abo3r2mlmjrpdmzle3rykajqc5xsj7j7ejksbpsa36ad.onion/ http://noescapemsqxvizdxyl7f7rmg5cdjwp33pg2wpmiaaibilb4btwzttad.onion/
Operating System : Ubuntu 22 Which version of TorBot that you're using? : current dev version. i git cloned it
How you're executing the application? python3 torbot -u http://website.onion --depth 2
TOR configuration : default config sudo apt install tor sudo service tor start
Also, is there a way to crawl based on a text file of email addresses?
You're welcome and thanks for providing the information, I'll look into it later today or sometime this week. There is no feature to crawl email addresses, the current program operates on HTML retrieved from sites so I don't know how that would be possible with email addresses but if you have suggestions for a new feature then feel free to submit a ticket and it'll be looked into. If you already know how the feature should be implemented then you can take a crack at it and submit a pull request to the repo.
correction, text file of websites* not email addresses. And thanks for looking into it. ill go and mess around with the settings and see what happens. 2 other things :
Thanks once again!
There's no way for us to crawl multiple websites at once right?
Not currently, it'd probably be a fairly straightforward feature to implement but no one has requested it. If you want to know what's possible or not, check the README. If you have ideas or suggestions, create a new ticket.
Or build it out yourself and submit it if you're capable.
I checked the URLs and the reason why it's only returning the host domain is that all of the links are paths within the same domain. The scraper looks for unique host domains that are fully qualified URIs. All of the links are paths to the same domain, not different sites.
I'll look into modifying the feature to identify paths.
Hi guys,
was following the guide step by step. However when i tried crawling a particular link i only get that link returned even though manually navigating TOR shows that there are multiple other links. Have tried for a few different websites but still having the same issue. Am unsure if its because of my settings or a bug.
Please advise.