DedSecInside / TorBot

Dark Web OSINT Tool
Other
2.73k stars 511 forks source link

Screenshot capture feature #270

Open PSNAppz opened 1 year ago

PSNAppz commented 1 year ago

Is your feature request related to a problem? Please describe. Screenshot capturing is a useful feature that can be added to an OSINT tool, as it allows the tool to take screenshots of the pages it crawls and save them to the database or file system. This can be useful for creating a visual record of the pages that have been crawled, which can be helpful for documenting the results of the crawling process. Additionally, it can be used for creating an archive of the crawled pages, which can be useful for analyzing changes over time.

Describe the solution you'd like With this feature, the tool can take screenshots at different resolution, different viewport, and even capture the whole webpage using a library such as puppeteer, Selenium, etc.

Describe alternatives you've considered N/A

Additional context It can also be useful for creating a visual comparison of the pages before and after a specific event.

KingAkeem commented 1 year ago

If anyone wants to take on this task before I do, here's some context.

You should make use of the LinkTree class which uses treelib to construct a tree data structure that can be printed, downloaded use tree operations such as searching the tree. https://github.com/DedSecInside/TorBot/blob/dev/torbot/modules/linktree.py

Using the class requires passing the root node of the tree and how far you would like the tree to be built, depth-wise

tree = LinkTree(root = "https://www.example.com", depth = 2) # builds tree on instantiation 
tree.show() # prints tree to std output
tree.save("test.txt") # saves tree results to `test.txt`

The tree nodes currently only save the URL, but treelib has a mechanism to extend nodes to store data. https://treelib.readthedocs.io/en/latest/index.html#advanced-usage

class WebMetadata(object):
  def __init__(self, html, headers): 
            self.html = html
            self.headers = headers

# using treelib library
tree = Tree()
resp = requests.get("https://www.example.com")
tree.create_node("root", "root", data=WebMetadata(resp.text, resp.headers)) # passing html and headers
pavankalyan224847 commented 8 months ago

is this issue closed or open ?

PSNAppz commented 8 months ago

@pavankalyan224847 This is open and not assigned to anyone.

KingAkeem commented 8 months ago

This comment https://github.com/DedSecInside/TorBot/issues/270#issuecomment-1382431363 is out of date.

The LinkTree does still exist but has been refactored completely, you can check out the refactored code here.

https://github.com/DedSecInside/TorBot/blob/dev/torbot/modules/linktree.py

Let me know if you have any questions.

pavankalyan224847 commented 8 months ago

can you assign this to me i would like to work on it

KingAkeem commented 8 months ago

@pavankalyan224847 Done!

KingAkeem commented 8 months ago

Updates?